Issue 113059

Summary: DOCX:, each comment include the text of all comments in the imported document
Product: Writer Reporter: ugomatic <ugomatic>
Component: open-importAssignee: Oliver-Rainer Wittmann <orw>
Status: CLOSED FIXED QA Contact:
Severity: Trivial    
Priority: P3 CC: doneyourself, gregor, issues, jon, ml, mmarholin, pavel, pescetti, scisteffan, stfhell
Version: OOO320m18   
Target Milestone: 4.1.0   
Hardware: All   
OS: All   
See Also: https://bugs.freedesktop.org/show_bug.cgi?id=34489
Issue Type: DEFECT Latest Confirmation in: 4.0.1
Developer Difficulty: Easy
Issue Depends on:    
Issue Blocks: 123771    
Attachments:
Description Flags
Sample patch
orw: review+
Word_2010_comments: sample file none

Description ugomatic 2010-07-09 19:44:20 UTC
When opening a .docx file with comments added by a user using MS Office, I
experience wrong formatting of the comments.

Each comment is displayed as expected in the right column, with an arrow
pointing to where the comment belongs.

However, the text of each comment is made up of the text of all comments to that
document combined. The text of each comment is in the right order, but the
result is misleading. 

Unfortunately I can't submit the relative attachment, so let me make an example.

Imagine a document with the three following comments:

c1: aaa
c2: bbb
c3: ccc

When opening the .docx file, the text of the three comments appears as:

c1: aaa bbb ccc
c2: aaa bbb ccc
c3: aaa bbb ccc

Please note that the file was commented using Ms Office for Mac. It is unclear
whether the original file was created with Mac or Windows version of MS Office.
I am experiencing this issue on a Mac running OS X 10.6.4, I don't know if other
platforms might be affected in a different way.
Comment 1 michael.ruess 2010-07-10 15:22:49 UTC
Can confirm this. Sample document to be followed...
Comment 2 michael.ruess 2010-08-16 15:53:20 UTC
*** Issue 113915 has been marked as a duplicate of this issue. ***
Comment 3 montyc 2010-09-14 12:05:16 UTC
... is this going to be fixed any time soon, as without this feature I'd be forced to use proprietary software.
Comment 4 montyc 2010-09-16 03:18:59 UTC
The problem does not occur in version OOo-dev 3.3.0 (Build 9519)
Comment 5 cheysch 2010-11-04 05:25:34 UTC
just installed rc3 (build 9539)
and we are back to the same issue, i will try to install build 9519 as suggested 
by montyc to see if that makes a difference for me
Comment 6 cheysch 2010-11-04 05:58:20 UTC
being a relative newbee here i failed to locate build 9518 so no news on that, if 
someone can give me a link to the ubuntu package of appropriate website i am more 
then willing to try. 
Comment 7 ugomatic 2011-02-12 15:40:22 UTC
I've just downloaded OO 3.3 for OS X (build 9567) and the problem has not been solved.
Sadly this is a major problem in collaborating with non foss users, and might force some of us to go back 
to proprietary. Anyone has clues on how to make this issue more popular within the community?
Comment 8 michael.ruess 2011-03-24 12:43:47 UTC
*** Issue 117531 has been marked as a duplicate of this issue. ***
Comment 9 mlinksva 2011-03-31 14:54:57 UTC
Has been fixed in LibreOffice, patch might be applicable, see https://bugs.freedesktop.org/show_bug.cgi?id=34489
Comment 10 pavel 2012-11-29 17:42:24 UTC
Take this one.

I was hit by this bug today.

Can anyone attach sample document in .docx containing two short paragraphs with one comment in each of them?

Thanks.
Comment 11 pavel 2012-11-29 20:12:51 UTC
Created attachment 79966 [details]
Sample patch

Sample patch attached.

Inspired by the similar LO change - see details in the patchnotes inside.

Any reviewers?
Comment 12 stfhell 2012-12-06 22:52:28 UTC
Created attachment 80006 [details]
Word_2010_comments: sample file

Test kit: sample text file with comments, produced by Word 2010 (saved as DOCX, DOC, ODT) plus Word 2010 screenshot.
Comment 13 Andrea Pescetti 2012-12-15 12:07:59 UTC
Marking as "easy". There are detailed instructions by Pavel Janik on how to solve the problem at http://s.apache.org/vI (ooo-dev mailing list).

I copy and paste them here.

The problem is very simple:

Grab some DOCX document containing more than two comments (so you can check the results). Unzip it.

Investigate Comments part (see Office Open XML Part 1 - Fundamentals And Markup Language Reference.pdf for more details).

Investigate 

xmllint --format word/comments.xml

Grep for comments:

bash-3.2$ xmllint --format word/comments.xml | grep "<w:comment"

Comments are numbered:

  <w:comment w:id="0" w:author="Deborah" w:date="2010-11-19T16:41:00Z" w:initials="D">
  <w:comment w:id="1" w:author="Deborah" w:date="2010-11-19T14:41:00Z" w:initials="D">
  <w:comment w:id="2" w:author="Deborah" w:date="2010-11-19T14:43:00Z" w:initials="D">
  <w:comment w:id="3" w:author="Deborah" w:date="2010-11-19T14:49:00Z" w:initials="D">
  <w:comment w:id="4" w:author="Deborah" w:date="2010-11-19T14:53:00Z" w:initials="D">
  <w:comment w:id="5" w:author="Deborah" w:date="2010-11-19T14:51:00Z" w:initials="D">

According to wml.xsd, attribute id (w:id) is:

  <xsd:complexType name="CT_Markup">
    <xsd:attribute name="id" type="ST_DecimalNumber" use="required"/>
  </xsd:complexType>

but our model (writerfilter/source/ooxml/model.xml) contains:

        <attribute name="id">
          <text/>

which means it is being worked on as a string which is wrong. It should be ST_DecimalNumber.

And thats all.
Comment 14 Gregor Flüggen 2013-08-19 12:46:50 UTC
Hi,

I would like to change 'latest confirmation on:	3.4.1' to 'latest confirmation on: 4.0.0'. I encountered the issue described for one and the same document in both 3.4.1 and 4.0.0 (separate pc's).

AOO400m3(Build:9702)  -  Rev. 1503704 2013-07-16 14:54:56 (Di, 16 Jul 2013)

Kind regards,

Gregor
Comment 15 Oliver-Rainer Wittmann 2013-12-02 10:22:42 UTC
Taking over to review and integrate proposed solution.
Comment 16 Oliver-Rainer Wittmann 2013-12-02 16:25:05 UTC
Comment on attachment 79966 [details]
Sample patch

proposed patch looks good.
I will apply it on branch ooxml-osba regarding my work on annotations/comments on text ranges.
Comment 17 Oliver-Rainer Wittmann 2013-12-03 07:25:59 UTC
I had some deeper investigation on the root cause.
My results are:
Our so-called 'model from OOXML' (file model.xml) contains the following definition:
      <define name="CT_MarkupRangeBookmark">
        <attribute name="id">
          <text/>
          <xs:documentation>Annotation Identifier</xs:documentation>
        </attribute>
        <ref name="CT_MarkupRange"/>
      </define>
But via complex type "CT_MarkupRange" it inherits the definition from complex type "CT_Markup" whose definition is:
      <define name="CT_Markup">
        <attribute name="id">
          <ref name="ST_DecimalNumber"/>
          <xs:documentation>Annotation Identifier</xs:documentation>
        </attribute>
      </define>
With leads to the following generated code in OOXMLFactory_wml.cxx:
    case NN_wml|DEFINE_CT_MarkupRangeBookmark:
        // CT_Markup
        (*pMap)[NS_wordprocessingml|OOXML_id] = NS_ooxml::LN_CT_Markup_id;
        // CT_MarkupRange
        (*pMap)[NS_wordprocessingml|OOXML_displacedByCustomXml] = NS_ooxml::LN_CT_MarkupRange_displacedByCustomXml;
        // CT_MarkupRangeBookmark
        (*pMap)[NS_wordprocessingml|OOXML_id] = NS_rtf::LN_IBKL;
        break;
and
    case NN_wml|DEFINE_CT_MarkupRangeBookmark:
      // CT_Markup
        (*pMap)[NS_wordprocessingml|OOXML_id] = AttributeInfo(RT_Integer, NN_wml|DEFINE_ST_DecimalNumber);
      // CT_MarkupRange
        (*pMap)[NS_wordprocessingml|OOXML_displacedByCustomXml] = AttributeInfo(RT_List, NN_wml|DEFINE_ST_DisplacedByCustomXml);
      // CT_MarkupRangeBookmark
        (*pMap)[NS_wordprocessingml|OOXML_id] = AttributeInfo(RT_String, 0);
        break;
==> duplicate attribute info for attribute id
Comment 18 Oliver-Rainer Wittmann 2013-12-03 09:39:30 UTC
Applying the needed correction to model.xml and performing further investigations reveals that parsing the comments/annotations sub stream for a certain comment/annotation is handled the same as parsing the footnotes sub stream and the endnotes sub stream for a certain footnote resp. endnote. Thus, it is needed to treat the IDs of footnotes and endnotes also as integers.
Comment 19 SVN Robot 2013-12-03 14:02:52 UTC
"orw" committed SVN revision 1547392 into branches/ooxml-osba:
113059: *.docx import: correct handling of comment/annotation, footnote and e...
Comment 20 Oliver-Rainer Wittmann 2013-12-03 14:04:20 UTC
fixed on branch ooxml-osba according Pavel's analysis and findings
Comment 21 SVN Robot 2013-12-19 13:38:52 UTC
"orw" committed SVN revision 1552317 into trunk:
113059: *.docx import: correct handling of comment/annotation, footnote and e...
Comment 22 Oliver-Rainer Wittmann 2013-12-19 13:40:24 UTC
fixed on trunk for the next release - thanks and kudos again to Pavel.
Comment 23 liuping 2014-04-10 03:14:20 UTC
verified on windows7 on AOO410m15(Build:9761)  -  Rev. 1583666
2014-04-01 13:53:14 (Di, 01 Apr 2014)