Issue 91562

Summary: Specify what happens to the meta data of text document entities when the document is edited
Product: Writer Reporter: Oliver-Rainer Wittmann <orw>
Component: codeAssignee: stefan.baltzer
Status: CLOSED FIXED QA Contact: issues@sw <issues>
Severity: Trivial    
Priority: P3 CC: discoleo, issues, mst.ooo, svante.schubert
Version: OOo 3.0 Beta   
Target Milestone: ---   
Hardware: All   
OS: All   
Issue Type: FEATURE Latest Confirmation in: ---
Developer Difficulty: ---
Issue Depends on:    
Issue Blocks: 91561    

Description Oliver-Rainer Wittmann 2008-07-11 07:54:40 UTC
Develop specification about the handling of meta data of text document entities
represented in the Writer core (paragraphs, headings, bookmarks, lists, list
items, etc.) when certain user actions are performed to such entities.
Comment 1 Oliver-Rainer Wittmann 2008-08-08 09:37:10 UTC
Done - please find the information here -->
http://wiki.services.openoffice.org/wiki/Writer/Metadata_Support#Handling_of_metadata_on_editing.2Fupdating_an_text_document
Comment 2 discoleo 2008-08-08 09:58:24 UTC
Having read briefly the specification, there are some things that worry me. I
might have misunderstood the specification.

How does CUT/COPY/PASTE behave?

More precisely, is the META-data lost?
This would be bad.

Although the meta-data content will be often accessed/extracted by external
applications (scripts, other programs), user actions like CUT/COPY/PASTE need to
preserve this meta-data, too. I hope this is kept in mind when implementing this
issue.

The point is, the major use of meta-data is in *reusing content*, so there is
really no difference between manual reuse (cut/copy/paste) and automatised reuse
(meta-data extraction by external applications). If the text-object looses its
meta-data during cut/copy/paste than the most useful feature gets lost. The
meta-data was a single use issue.

Sometimes it might be necessary to strip away all/some meta-data, BUT this is a
completely different issue (and needs to be addressed within the security
framework). This is not an issue of meta-data per se.

So, I would keep these 2 issues separate and always preserve the meta-data.
Comment 3 Oliver-Rainer Wittmann 2008-08-08 10:45:01 UTC
ad discoleo:
Thanks for the fast feedback.

I am not sure, if I understand your concern correctly.
The metadata itself is not touched by actions cut/copy/paste of a certain entity
having metadata regardless, if the action is performed via the user interface or
the UNO-API. It is the metadata reference of the entities, which is affected by
these actions.
Think of the following text document:
Text document containing two paragraphs, each having metadata. The paragraphs
refer to its metadata via a metadata reference - mainly value of ODF property
xml:id. 
Action 1: When the user cuts the first paragraph, the metadata are still in the
RDF/RDFa repository and be accessed. But no entity of the text document no
longer referencing this metadata.
Action 2: The first paragraph is copied and pasted as a new third paragraph. The
newly inserted third paragraph will lose its metadata reference, because the
referenced metadata is already referenced by the first paragraph.
Action 3: The first paragraph is cut and pasted as the new third paragraph. The
newly inserted third paragraph will keep its metadata reference and still has
metadata, because this metadata is still in the RDF/RDFa repository of the text
document.
Action 4: The first paragraph is copied and pasted as the first paragraph into a
newly created text document. The newly inserted first paragraph in the new text
document will have a metadata reference, but this metadata reference can not be
resolved inside the new text document, because the RDF/RDFa repository of the
new text document does not contain any metadata.

I hope this explanation resolves your concerns.

BTW, what we need to decide is what happens to metadata, which are no longer
referenced (e.g. after action 1), when the text document is saved. Should the
created ODF still contain such metadata or does the created ODF only contains
metadata, which are referenced by a certain entity of the ODF? What is your opinion?

What do you mean by "the major use of metadata is in reusing content"?
Comment 4 discoleo 2008-08-08 11:40:26 UTC
*Reusing Content*

One of the main use-cases of meta-data is for re-using content. Lets explain
this a little bit and I will take a dictionary-style example where we define the
meaning of a new word using meta-data. [Actual use cases are likely to be well
beyond this very simple/trivial example.]

Now, we have the first document where we have this new word and the meta-data.
However, I might i.) copy this new word to a new document or ii.) within the
same document, and I wish that it's meta-data is preserved:
  i.) the meta-data should be copied to the new document
 ii.) the 2nd copy should point to the existing meta-data

Lets take a more realistic example: we may have a diagnostic study performed,
and the lab reports the results. These results may contain a lot of meta-data
(like Lab-specific parameters, machine type, test conditions, ...). Now, I
receive these results and wish to write a new document which contains these
results. I will copy/paste the desired data and I would welcome that the
meta-data is copied as well. Basically, copying the meta-data, too, has 2 roles:
  i.) it will flank, and therefore identify the specific data
      in the 2nd document (so it can be automatically detected in this
      document, too)
 ii.) it conveys additional information, not present in the visible text

Now, lets go back to the Actions in the previous e-mail. I still might
misunderstand this feature (or meta-data more globally), but this is *my
meaning* of meta-data (or the way I find it most useful).

Action 1: CUT
  Nothing points to the meta-data, but after paste, the text-object should point
again at this meta-data (either left in place, or copied to the new document IF
it was pasted to a new document)

Action 2: COPY and PASTE as new paragraph
  As we can't predict which paragraph is important (will be re-used for its
content/meta-data), both paragraphs should continue to point to the same
meta-data. Especially, because manually copying data won't ensure that the right
paragraph is copied. [IF the data was handled automatically, the parser could
detect the paragraph that still has the meta-data attached to it, but this is
not true for manual copy/paste.]

Action 3: CUT and PASTE (meta-data is preserved)

Action 4: COPY and PASTE in new document
  The meta-data stream should be copied to the new document and the paragraph
shall point to this new copy.

As I said, the content might traverse different documents:
 PROVIDER 1 => generates meta-data [document 1] => COPIED to document 2
 [rather then CUT/PASTE, the COPY/PASTE is more likely] =>
 COPIED to document 3 => ...

All users will benefit from the meta-data, so all objects originating from the
original object should reference the meta-data (or a copy of the meta-data). As
I said, my expertise in meta-data is very limited, but this is what I understand
from meta-data and how I imagine it being most useful.

When nothing points to the meta-data anymore, then the best way to handle this
is to delete the meta-data (should be undoable as long as the Undo is allowed;
should be deleted completely after the undo is not possible anymore).
Comment 5 Oliver-Rainer Wittmann 2008-08-08 12:30:09 UTC
ad discoleo:
I think I have got your request - I think it is a valid request, but I think
that we can not fully support this request currently.

> Now, we have the first document where we have this new word and the meta-data.
> However, I might i.) copy this new word to a new document or ii.) within the
>same document, and I wish that it's meta-data is preserved:
>  i.) the meta-data should be copied to the new document
Currently, we can not support the copy of the meta-data.
This has two reasons:
- a technical one, which would be solvable. The current clipboard implementation
lacks of a RDF/RDFa repository.
- a complex one, on which MST can give more insight. The RDF data could be quite
complex - it is a graph, as I was told. It is not an easy task to general
identify, which RDF data has to be copied.
> ii.) the 2nd copy should point to the existing meta-data
This can not work, because the 2nd copy would have the same xml:id and a xml:id
can be only assigned once to a certain entity. See the ODF 1.2 metadata
specification, no two ODF elements can have the same value for its xml:id property.

Your valid use case
> PROVIDER 1 => generates meta-data [document 1] => COPIED to document 2
> [rather then CUT/PASTE, the COPY/PASTE is more likely] =>
> COPIED to document 3 => ...
can be workaround by:
provider 1 => generates meta-data [document 1] => open document 1 and save-as
document 2 => open document 2, delete everything, which is not needed and save
document 2 => open document 2 and save-as document 3 => open document 3, ...
Comment 6 mst.ooo 2008-08-08 13:27:58 UTC
Hi all,

the main problem that i see with copying metadata is identifying which part of
the repostiory should be considered the metadata that is "connected to" some ODF
element.
This is not trivial because the data model of RDF (which is the standard we use
for metadata) is a graph.
An ODF element may be mapped to a RDF URI, by way of its xml:id attribute.
So you can basically construct arbitrary structures with RDF, and all that you
know about the relationship to the ODF element that is copied is the mapped URI.
If you, say, only copy the RDF statements that contain the URI of the element,
then you will very likely copy too little.
If you, say, copy the entire strongly connected component that the URI node is
part of, as well as all nodes that are reachable from this scc, then you will
likely copy too much (in the limit case, the entire graph).
The problem is that we (the OO.org application) do not know what the _meaning_
of the metadata is (that is kind of the point of the whole enterprise).
Deleting "unused" metadata has similar issues, of course.

> > ii.) the 2nd copy should point to the existing meta-data
> This can not work, because the 2nd copy would have the same xml:id and a xml:id
> can be only assigned once to a certain entity. See the ODF 1.2 metadata
> specification, no two ODF elements can have the same value for its xml:id
> property.

Well, almost; actually, the problem is that a single URI can be mapped only to a
single xml:id in the manifest. The mapping between xml:id and URI is 1:1. This
is (apparently) intentional.
Comment 7 discoleo 2008-08-08 13:49:05 UTC
Hello all,

my view of the real-life scenario is like this: a simple content gets rarely
passed by to another document as it is. More often, a new document is assmebled
from various existing documents.

Consider this:
                                                  _
LAB 1 => generate document + meta-data             |
                                                   |
LAB 2 => generate document + meta-data             |
                                                   | =>
LAB 1 => new data + meta-data                      |
                                                   |
RECEIVER => generate other content + meta-data     |
                                                  -

RECEIVER will combine his own generated content with contents from the various
LABS and will generate a new document containing this data and hopefully all the
relevant meta-data along. This is a more realistic use-case. Then he passes this
document along to a new RECEIVER who again will corroborate various sources to
generate a new document with combined meta-data and passes it along to a new
receiver (maybe back to the first one or another one, e.g. a hospital).

With regard to pasting within the same document: I do not have a strong opinion
about this, as in the field I work it won't occur that often. But copying the
meta-data to a new document is something that in my opinion is very important
and the main utility of meta-data.
Comment 8 Oliver-Rainer Wittmann 2008-08-08 14:41:47 UTC
ad discoleo:
I see. In my opinion your use case is a valid one, which should be supported.
But, I do not think that can be solved easily and in general.
Please consider that the given specification should work in general, especially
for metadata which are unknown to the application. If you have a certain
extension installed, which manages a certain type of metadata, this extension
will typically support the copying issue. At least I would expect that such an
extension supports the user on this issue. Such an extension could also forbid
editing or copying of the entities referencing its managed metadata.

Again another workaround for your new use case:
- Create new text document.
- Use function "Menu - Insert - File" to include all needed documents.
- Delete unwanted parts from the resulting document.
Comment 9 Oliver-Rainer Wittmann 2008-08-08 14:46:50 UTC
SUS and I adjusted the wording of specification.
Comment 10 Oliver-Rainer Wittmann 2008-10-28 11:24:19 UTC
made some adjustment to the handling of metadata on editing - see wiki.
Some things needed to be changed to simplify implementation.
Comment 11 Oliver-Rainer Wittmann 2009-06-10 12:43:32 UTC
OD->SBA:
Please check the created wiki page to verify this issue
Comment 12 stefan.baltzer 2009-06-17 09:29:59 UTC
Verified on WIKI page.
Comment 13 stefan.baltzer 2009-06-17 10:41:35 UTC
Adusting Target (CWS shifted to 3.2).