Bug 56205

Summary: [PATCH] Upgrade OOXML schema to 3rd edition (transitional)
Product: POI Reporter: Andreas Beeker <kiwiwings>
Component: POI OverallAssignee: POI Developers List <dev>
Status: NEW ---    
Severity: enhancement CC: aldrin.baroi, franziska.herger
Priority: P2 Keywords: PatchAvailable
Version: 3.11-dev   
Target Milestone: ---   
Hardware: All   
OS: All   
Bug Depends on:    
Bug Blocks: 58149    
Attachments: [PATCH] Upgrade OOXML schema to 3rd edition (transitional)
[PATCH] Upgrade OOXML schema to 3rd edition (transitional)
I have attached the MS Word 2013 for testing purpose.

Description Andreas Beeker 2014-03-03 00:29:51 UTC
Created attachment 31358 [details]
[PATCH] Upgrade OOXML schema to 3rd edition (transitional)

The currently used OOXML schema is a bit outdated and therefore hasn't got a few of the recently added Office features (e.g. XSSF sheetProtection is missing a few attributes).

The ECMA site [1] has already provided a 4th edition of the schemas.
After fiddling around a bit with the schemas, I think it's ok to use the 3rd edition of the transitional schema, as its main incompatibility (to the 1st edition) are length and percent definitions. The 4th edition would cause much more changes in the current POI codebase.

Although the testcases run through with some modifications, I'm not sure about the impact, which this patch would have on users code, especially when the xmlbeans classes are used directly.

Therefore I'd like to discuss in this entry, if features which aren't covered by the 1st edition schema, should be created dynamically without a backing schema or if it's ok to potentially break user code which works with the internal xml representation.


[1] http://www.ecma-international.org/publications/standards/Ecma-376.htm
Comment 1 Nick Burch 2014-03-03 10:07:05 UTC
If we use the newer schemas, how does that change what we output? Will it mean that the files we generate stop being compatible with older office versions?

How about input? Will it mean we stop being able to read files generated by older versions of POI, or older versions of office?
Comment 2 Andreas Beeker 2014-03-04 00:41:50 UTC
I thought about comparing the schemas, but the changes are substantially, so that doesn't work.

So the next try was to check the poi examples and their output, XSSF and XSLF looked ok, but XWPF has some problems (e.g. the SimpleDocument example looks not ok).
So the patch needs some rework, especially as most changes were in the XWPF part.

Apart of the junit test I haven't checked input processing yet.

There are some infos about the compatibility on this office article [1] - note the line: "... writes files conformant to ISO/IEC 29500 Transitional ...", but when you look into the details (e.g. [2] as an example for the other affected percent attributes), you see that although it's able to read an alternative format, it writes the legacy format. 

As far as I have checked the changes for length/percent attributes, it depends on POI if the resulting file can be read by versions < 2010, e.g. if measurement units are used in length attributes, the file probably can't be read anymore by versions < 2010. Therefore we would need to take care when populating new attributes to stick with the legacy format, if possible.

The new "sharedTypes" namespace [3] seems to stay out of the resulting file.

So I guess in the end, it's a trade off 
- using a new schema and potentially using/introducing features which can be only used in newer Office versions
- vs. having the greatest common format, i.e. a schema which only allows one kind of attribute content

> If we use the newer schemas, how does that change what we output? Will it mean that the files we generate stop being compatible with older office versions?
That depends, if we use the new features

> How about input? Will it mean we stop being able to read files generated by older versions of POI, or older versions of office?
The 3rd transitional schema should be compatible to the 1st edition - but there are certain features like VML, which are phased out.


[1] http://msdn.microsoft.com/en-US/library/office/gg607163(v=office.14).aspx
[2] http://msdn.microsoft.com/en-us/library/gg548598(v=office.12).aspx
[3] http://schemas.openxmlformats.org/officeDocument/2006/sharedTypes
Comment 3 Andreas Beeker 2014-03-12 23:24:42 UTC
Created attachment 31383 [details]
[PATCH] Upgrade OOXML schema to 3rd edition (transitional)

ok, the SimpleDocument test was a false positive error (or whatever you call it).
The generated document only differs in the few STOnOff attributes and will be displayed the same in the MS Word View irrespective of the schema version.
But when viewed in Libre Office, both versions are broken.

Apart of the poi examples, I did a png-rendering of the themes.pptx and this looks also ok, so the unit conversions should be ok.

I've changed some unit calculation, to use the dxa call.
Comment 4 Rathna 2015-07-09 15:35:29 UTC
Created attachment 32894 [details]
I have attached the MS Word 2013 for testing purpose.

Below link explains about the error.

http://apache-poi.1045710.n5.nabble.com/Apache-POI-is-not-supporting-MS-Word-2013-Strict-Open-XML-Document-docx-td5719328.html#a5719338
Comment 5 PJ Fanning 2017-10-16 17:33:53 UTC
Hi Rathna,
We don't support Strict OOXML documents. See https://bz.apache.org/bugzilla/show_bug.cgi?id=57699.