Bug 56205 - [PATCH] Upgrade OOXML schema to 3rd edition (transitional)
Summary: [PATCH] Upgrade OOXML schema to 3rd edition (transitional)
Status: NEW
Alias: None
Product: POI
Classification: Unclassified
Component: POI Overall (show other bugs)
Version: 3.11-dev
Hardware: All All
: P2 enhancement (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords: PatchAvailable
Depends on:
Blocks: 58149
  Show dependency tree
 
Reported: 2014-03-03 00:29 UTC by Andreas Beeker
Modified: 2017-10-16 17:33 UTC (History)
2 users (show)



Attachments
[PATCH] Upgrade OOXML schema to 3rd edition (transitional) (15.29 KB, application/octet-stream)
2014-03-03 00:29 UTC, Andreas Beeker
Details
[PATCH] Upgrade OOXML schema to 3rd edition (transitional) (15.00 KB, application/octet-stream)
2014-03-12 23:24 UTC, Andreas Beeker
Details
I have attached the MS Word 2013 for testing purpose. (11.07 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2015-07-09 15:35 UTC, Rathna
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Andreas Beeker 2014-03-03 00:29:51 UTC
Created attachment 31358 [details]
[PATCH] Upgrade OOXML schema to 3rd edition (transitional)

The currently used OOXML schema is a bit outdated and therefore hasn't got a few of the recently added Office features (e.g. XSSF sheetProtection is missing a few attributes).

The ECMA site [1] has already provided a 4th edition of the schemas.
After fiddling around a bit with the schemas, I think it's ok to use the 3rd edition of the transitional schema, as its main incompatibility (to the 1st edition) are length and percent definitions. The 4th edition would cause much more changes in the current POI codebase.

Although the testcases run through with some modifications, I'm not sure about the impact, which this patch would have on users code, especially when the xmlbeans classes are used directly.

Therefore I'd like to discuss in this entry, if features which aren't covered by the 1st edition schema, should be created dynamically without a backing schema or if it's ok to potentially break user code which works with the internal xml representation.


[1] http://www.ecma-international.org/publications/standards/Ecma-376.htm
Comment 1 Nick Burch 2014-03-03 10:07:05 UTC
If we use the newer schemas, how does that change what we output? Will it mean that the files we generate stop being compatible with older office versions?

How about input? Will it mean we stop being able to read files generated by older versions of POI, or older versions of office?
Comment 2 Andreas Beeker 2014-03-04 00:41:50 UTC
I thought about comparing the schemas, but the changes are substantially, so that doesn't work.

So the next try was to check the poi examples and their output, XSSF and XSLF looked ok, but XWPF has some problems (e.g. the SimpleDocument example looks not ok).
So the patch needs some rework, especially as most changes were in the XWPF part.

Apart of the junit test I haven't checked input processing yet.

There are some infos about the compatibility on this office article [1] - note the line: "... writes files conformant to ISO/IEC 29500 Transitional ...", but when you look into the details (e.g. [2] as an example for the other affected percent attributes), you see that although it's able to read an alternative format, it writes the legacy format. 

As far as I have checked the changes for length/percent attributes, it depends on POI if the resulting file can be read by versions < 2010, e.g. if measurement units are used in length attributes, the file probably can't be read anymore by versions < 2010. Therefore we would need to take care when populating new attributes to stick with the legacy format, if possible.

The new "sharedTypes" namespace [3] seems to stay out of the resulting file.

So I guess in the end, it's a trade off 
- using a new schema and potentially using/introducing features which can be only used in newer Office versions
- vs. having the greatest common format, i.e. a schema which only allows one kind of attribute content

> If we use the newer schemas, how does that change what we output? Will it mean that the files we generate stop being compatible with older office versions?
That depends, if we use the new features

> How about input? Will it mean we stop being able to read files generated by older versions of POI, or older versions of office?
The 3rd transitional schema should be compatible to the 1st edition - but there are certain features like VML, which are phased out.


[1] http://msdn.microsoft.com/en-US/library/office/gg607163(v=office.14).aspx
[2] http://msdn.microsoft.com/en-us/library/gg548598(v=office.12).aspx
[3] http://schemas.openxmlformats.org/officeDocument/2006/sharedTypes
Comment 3 Andreas Beeker 2014-03-12 23:24:42 UTC
Created attachment 31383 [details]
[PATCH] Upgrade OOXML schema to 3rd edition (transitional)

ok, the SimpleDocument test was a false positive error (or whatever you call it).
The generated document only differs in the few STOnOff attributes and will be displayed the same in the MS Word View irrespective of the schema version.
But when viewed in Libre Office, both versions are broken.

Apart of the poi examples, I did a png-rendering of the themes.pptx and this looks also ok, so the unit conversions should be ok.

I've changed some unit calculation, to use the dxa call.
Comment 4 Rathna 2015-07-09 15:35:29 UTC
Created attachment 32894 [details]
I have attached the MS Word 2013 for testing purpose.

Below link explains about the error.

http://apache-poi.1045710.n5.nabble.com/Apache-POI-is-not-supporting-MS-Word-2013-Strict-Open-XML-Document-docx-td5719328.html#a5719338
Comment 5 PJ Fanning 2017-10-16 17:33:53 UTC
Hi Rathna,
We don't support Strict OOXML documents. See https://bz.apache.org/bugzilla/show_bug.cgi?id=57699.