Bug 67396 - Excel getting corrupted in poi 5.0.0 because of standalone = no
Summary: Excel getting corrupted in poi 5.0.0 because of standalone = no
Status: REOPENED
Alias: None
Product: POI
Classification: Unclassified
Component: XSSF (show other bugs)
Version: unspecified
Hardware: PC All
: P1 regression (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-09-14 08:28 UTC by Rakhi Barayanan
Modified: 2023-11-22 12:34 UTC (History)
2 users (show)



Attachments
File before saving (252.93 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2023-09-21 06:55 UTC, Rakhi Barayanan
Details
File after saving--works fine (274.84 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2023-09-21 06:55 UTC, Rakhi Barayanan
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Rakhi Barayanan 2023-09-14 08:28:10 UTC
Hi team,

Could see STANDALONE="NO" IN XML BEHIND THE EXCEL OUTPUT FROM latest poi,this causing  corruption issues  in the 3rd
party applications when accessing the file.


In poi 3.17 ,the xml was having STANDALONE="YES"

Could you please let us know ,how to fix this.
Comment 1 PJ Fanning 2023-09-14 08:38:41 UTC
POI is a volunteer project with no releases in the last 12 months.

You would be better advised to get the 3rd party apps fixed. They should be able to handle XML regardless of what the `standalone` value is in the XML declaration.

We are not in the business of hacking POI just because 3rd party apps have shortcomings.
Comment 2 PJ Fanning 2023-09-14 08:39:56 UTC
If you want, you can fork POI yourself and hack the standalone value as much as you like.
Comment 3 Rakhi Barayanan 2023-09-14 09:05:35 UTC
Could you please let me know why it is changed,or give me the bug id for this change,so that I can revert back that change alone and see.

standalone= yes in poi 3.17 and 
standalone= no in poi 5.0
Comment 4 PJ Fanning 2023-09-14 09:12:11 UTC
The svn history is there for you to work through yourself. Why would an unpaid volunteer do this for you?
Comment 5 PJ Fanning 2023-09-14 10:08:56 UTC
I did a test with POI 5.2.3 and I see that some of the XML files inside the xlsx files created by POI can end up with `standalone="no"`.

See write-example branch of https://github.com/pjfanning/poi-gradle-example

The files I saw with this are:
[Content_Types].xml
docProps/core.xml
_rels/.rels
xl/_rels/workbook.xml.rels

These files seem to be written using org.apache.poi.openxml4j.opc.StreamHelper.

This class uses a javax Transformer and tries to set standalone=yes but it looks like the transformer ignores this.

I added 'net.sf.saxon:Saxon-HE:11.6' to the classpath and this means that Saxon provides the Transformer instead of the default Java one being used. With Saxon, the output has standalone=yes. Note that Saxon-HE 12 has changes that mean it doesn't work with POI/XMLBeans.

So try adding 'net.sf.saxon:Saxon-HE:11.6' to your classpath and see if that helps  in your case.
Comment 6 Rakhi Barayanan 2023-09-14 10:43:32 UTC
Thanks for looking into this.I used Saxon-HE-11.6.jar,but unfortunately it created a 0 byte output xlsx for me.
Comment 7 PJ Fanning 2023-09-14 10:51:28 UTC
https://bugs.openjdk.org/browse/JDK-8230359 might cover the Java issue where it doesn't set the standalone flag correctly.
Comment 8 PJ Fanning 2023-09-14 11:24:35 UTC
I added r1912305 as an experimental fix. Will need to see how this fares in the CI build. It's a bit of a hack but I can't see how how else to workaround the bug in the JDK javax Transformer.
Comment 9 Rakhi Barayanan 2023-09-15 06:14:34 UTC
Thanks for the code change Fanning,that worked for me.You saved my day.
Thanks again for the fix.Will this fix be part of any future release?
Comment 10 PJ Fanning 2023-09-17 16:36:26 UTC
will be in POI 5.2.4 whenever that is released - no date planned yet
Comment 11 Rakhi Barayanan 2023-09-21 06:53:43 UTC
Hi Fanning,

The standalone=yes is not added in all the places,for example it is still missing in these instances.

docProps\app.xml 
xl\sharedStrings.xml 
xl\styles.xml 
xl\workbook.xml 

I shared the jar with code change to one of our customer and it didnot work.But once they opened the excel and saved it ,the size got changed and they were able to work with the saved copy.So something is still missing in the Excel we are creating .

I am attaching the copies of Excel before anf after the save operation.
Comment 12 Rakhi Barayanan 2023-09-21 06:55:21 UTC
Created attachment 39057 [details]
File before saving

File before saving
Comment 13 Rakhi Barayanan 2023-09-21 06:55:51 UTC
Created attachment 39058 [details]
File after saving--works fine
Comment 14 PJ Fanning 2023-09-21 12:17:03 UTC
I have added tests using r1912456 that show that simple XSSF and SXSSF workbooks do not create XML with standalone=no. Provide some code that demos the issue - I have no plans to do further work on this without proof of an issue.
Comment 15 Rakhi Barayanan 2023-09-26 10:21:54 UTC
Thanks for supporting me on this,unfortunately,I couldnot share my code with you.But I have shared the excel file generated ,the saved copy is working fine .I have seen xml file differnces too and shared the details.

docProps\app.xml 
xl\sharedStrings.xml 
xl\styles.xml 
xl\workbook.xml
Comment 16 Rakhi Barayanan 2023-10-05 09:56:20 UTC
Could you please let me know which java code is responsible for generatng this list of xml files.

docProps\app.xml 
xl\sharedStrings.xml 
xl\styles.xml 
xl\workbook.xml
Comment 17 Rakhi Barayanan 2023-11-10 09:04:38 UTC
Hi,

Thanks for looking into this.I have tested with poi 5.2.4.
Unfortuanately it doesnot worked.
One thing I noticed is there is no where we could see "standalone=no" parameter in xml.
But the following xml files miss ,the standalone parameter,so as it is not present that will be treated as no i guess.So I unzipped the excel modified the xml files by adding "standalone = yes" parameter in xml ,zippped it back to excel and it started working

\filename\xl\sharedStrings.xml
\filename\xl\styles.xml
\filename\xl\workbook.xml
\filename\docProps\app.xml
\filename\xl\worksheets\sheet1.xml

So could you please help me to identify which jave modules/files i need to modify to add this parameter inorder to populate in the above xml files.
Comment 18 Rakhi Barayanan 2023-11-15 05:55:09 UTC
Could somebody help me with the filenames where I need to make the modification.
I could see the file POIXMLTypeLoader.java in main\java\org\apache\poi\ooxml,where poi is setting the default xml values,but could not find standalone variable set there.
Is there a way I can set standalone=yes over here which will help me to get this value in all the xml files in the following list

\filename\xl\sharedStrings.xml
\filename\xl\styles.xml
\filename\xl\workbook.xml
\filename\docProps\app.xml
\filename\xl\worksheets\sheet1.xml
Comment 19 PJ Fanning 2023-11-20 14:18:28 UTC
One thing that you could try is writing your code to open an xlsx file and to modify the XML files yourself. Xlsx files are in zip format. If the file is not password protected, you can use commons-compress lib to parse the file and modify its contents.
Comment 20 Rakhi Barayanan 2023-11-21 06:37:20 UTC
I have done this manually and the output file worked.My concern is if i do it like this,it will be again an additional read and write,is there a way to modify the xml on the fly.Before writing it as xlsx.Which part of code is responsible for that.
I could see POIXMLTypeLoader.java is doing many default xml value settings,but I couldnot find a code to set the standalone parameter there.Or do we have a way to set standalone=yes in all the xml files by default.