Bug 65695 - org.apache.poi.ss.extractor.EmbeddedExtractor extract "xls" embedded file can not be opened by office excel
Summary: org.apache.poi.ss.extractor.EmbeddedExtractor extract "xls" embedded file can...
Status: RESOLVED WONTFIX
Alias: None
Product: POI
Classification: Unclassified
Component: XSSF (show other bugs)
Version: 4.1.2-FINAL
Hardware: PC All
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-11-22 02:09 UTC by 1120955357
Modified: 2021-11-23 07:12 UTC (History)
1 user (show)



Attachments
this is origin excel template (57.00 KB, application/vnd.ms-excel)
2021-11-22 02:09 UTC, 1120955357
Details

Note You need to log in before you can comment on or make changes to this bug.
Description 1120955357 2021-11-22 02:09:13 UTC
Created attachment 38097 [details]
this is origin excel template

I am trying to extract embedded files from excel using `org.apache.poi.ss.extractor.EmbeddedExtractor`, But when the embedded file is xls or xlsx, the extracted file can not be opened by office excel. Here is my test case.

```
    @Test
    public void testExtractEmbeddedFiles() throws IOException {
        Workbook workbook = WorkbookFactory.create(new File("template.xls"));
        Sheet attachment1 [details] = workbook.getSheet("attachment1 [details]");
        EmbeddedExtractor extractor = new EmbeddedExtractor();
        List<EmbeddedData> embeddedData = extractor.extractAll(attachment1 [details]);
        for (EmbeddedData embeddedDatum : embeddedData) {
            try (FileOutputStream output = new FileOutputStream(System.currentTimeMillis() + ".xls")) {
                org.apache.commons.io.IOUtils.write(embeddedDatum.getEmbeddedData(),
                        output);
            }
        }
        workbook.close();
    }
```
Comment 1 PJ Fanning 2021-11-22 09:15:26 UTC
I tried this workbook with the latest POI code (basically POI 5.1.0) and did not get an error.

The sheet names in your workbook are:
data
attachment1 [details]
attachment2 [details]

So I used EmbeddedExtractor on attachment1 [details].

What error are you getting? If you have a stacktrace can you provide it?

Can you use POI 5.1.0? POI 4 is no longer supported for patches.
Comment 2 1120955357 2021-11-22 09:42:56 UTC
(In reply to PJ Fanning from comment #1)
> I tried this workbook with the latest POI code (basically POI 5.1.0) and did
> not get an error.
> 
> The sheet names in your workbook are:
> data
> attachment1 [details]
> attachment2 [details]
> 
> So I used EmbeddedExtractor on attachment1 [details].
> 
> What error are you getting? If you have a stacktrace can you provide it?
> 
> Can you use POI 5.1.0? POI 4 is no longer supported for patches.


The code is running normally, but after exporting the embedded file, I can't open it in Office Excel.
Comment 3 1120955357 2021-11-22 09:46:20 UTC
(In reply to 1120955357 from comment #2)
> (In reply to PJ Fanning from comment #1)
> > I tried this workbook with the latest POI code (basically POI 5.1.0) and did
> > not get an error.
> > 
> > The sheet names in your workbook are:
> > data
> > attachment1 [details]
> > attachment2 [details]
> > 
> > So I used EmbeddedExtractor on attachment1 [details].
> > 
> > What error are you getting? If you have a stacktrace can you provide it?
> > 
> > Can you use POI 5.1.0? POI 4 is no longer supported for patches.
> 
> 
> The code is running normally, but after exporting the embedded file, I can't
> open it in Office Excel.

My project is rather old, so upgrading POI to 5.0 is difficult because the maven dependencies after upgrading POI 5.0 conflict with my project.
Comment 4 PJ Fanning 2021-11-22 09:56:28 UTC
I get this when trying to open the embedded xls in OneDrive.

We can't open this workbook. It's set to show only certain named items, but they aren't in the workbook.
You might want to contact the author for more information.

The file opens in Apple Numbers. HSSFWorkbook can parse it.
Comment 5 1120955357 2021-11-22 11:15:05 UTC
(In reply to PJ Fanning from comment #4)
> I get this when trying to open the embedded xls in OneDrive.
> 
> We can't open this workbook. It's set to show only certain named items, but
> they aren't in the workbook.
> You might want to contact the author for more information.
> 
> The file opens in Apple Numbers. HSSFWorkbook can parse it.

Thank you very much for your reply,

Sorry, I can't understand One Drive's error message, I have no idea how to solve the problem

I can use POI to parse the embedded excel correctly, And I can use Microsoft Office to open the embedded excel from the original document.

But I can't open the exported embedded excel using POI in Microsoft Office.Excel, observe the size of the exported files compared with the original file, the size of the exported files is slightly larger than the size of the original document.I don't know why it happened.

By the way, I don't know who you mentioned "You might want to contact the author for more information". Microsoft Office?
Comment 6 1120955357 2021-11-22 11:19:53 UTC
(In reply to 1120955357 from comment #5)
> (In reply to PJ Fanning from comment #4)
> > I get this when trying to open the embedded xls in OneDrive.
> > 
> > We can't open this workbook. It's set to show only certain named items, but
> > they aren't in the workbook.
> > You might want to contact the author for more information.
> > 
> > The file opens in Apple Numbers. HSSFWorkbook can parse it.
> 
> Thank you very much for your reply,
> 
> Sorry, I can't understand One Drive's error message, I have no idea how to
> solve the problem
> 
> I can use POI to parse the embedded excel correctly, And I can use Microsoft
> Office to open the embedded excel from the original document.
> 
> But I can't open the exported embedded excel using POI in Microsoft
> Office.Excel, observe the size of the exported files compared with the
> original file, the size of the exported files is slightly larger than the
> size of the original document.I don't know why it happened.
> 
> By the way, I don't know who you mentioned "You might want to contact the
> author for more information". Microsoft Office?

Now I am trying another solution, using POI to parse the embedded file and copy it to another new workbook. But I can't find any tool class for copying workbooks from POI now.
Comment 7 1120955357 2021-11-23 07:12:57 UTC
Finally, I found the reason. When inserting an xls or xlsx as an embedded file into another excel file. Microsoft Office sets the embedded excel view to be hidden. So using POI to extract the embedded file, Microsoft Office cannot display the workbook in the GUI. So the solution is to call Workbook.setHidden(false).

But XSSFWorkbook::setHidden is not supported yet. I found another topic for this issue and hope it will help you.


https://bz.apache.org/bugzilla/show_bug.cgi?id=61699