Bug 65683 - call workbook.removeSheetAt will not remove attachments
Summary: call workbook.removeSheetAt will not remove attachments
Status: NEW
Alias: None
Product: POI
Classification: Unclassified
Component: XSSF (show other bugs)
Version: 4.1.2-FINAL
Hardware: PC All
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-11-16 08:39 UTC by 1120955357
Modified: 2021-11-22 09:47 UTC (History)
1 user (show)



Attachments
this is origin excel template (846.50 KB, application/vnd.ms-excel)
2021-11-22 01:50 UTC, 1120955357
Details
this is extract whole sheet named "attachment1" file from the origin excel (845.00 KB, application/vnd.ms-excel)
2021-11-22 02:01 UTC, 1120955357
Details

Note You need to log in before you can comment on or make changes to this bug.
Description 1120955357 2021-11-16 08:39:46 UTC
I have an Excel file that contains several sheet pages, each containing one or more attachments, and I want to extract one of the sheet pages and save it to a new workbook.

My approach is to remove the rest of the sheet pages from the original document and save them to a new file. This will achieve what I need, but the extracted file is basically the same size as the original file, and I want to be able to keep only the files in the sheet I need in the new workbook.

Can anyone help me?
Comment 1 Dominik Stadler 2021-11-21 17:26:31 UTC
Can you attach a sample workbook and some code for a reproducible test-case which shows your case and allows others to try to help?
Comment 2 1120955357 2021-11-22 01:50:36 UTC
Created attachment 38095 [details]
this is origin excel template
Comment 3 1120955357 2021-11-22 02:01:26 UTC
Created attachment 38096 [details]
this is extract whole sheet named "attachment1 [details]" file from the origin excel

this is test case.

```
@Test
    public void testExtractSheet() throws IOException {
        Workbook workbook = WorkbookFactory.create(new File("template - 副本.xls"));
        try {
            int numberOfSheets = workbook.getNumberOfSheets();
            boolean found = false;
            String sheetNameToExtract = "attachment1 [details]";

            for (int i = 0; i < numberOfSheets; i++) {
                Sheet sheetAt = workbook.getSheetAt(i);
                if (!sheetAt.getSheetName().equalsIgnoreCase(sheetNameToExtract)) {
                    workbook.removeSheetAt(i--);
                    numberOfSheets--;
                } else {
                    found = true;
                }
            }
            if (!found) {
                workbook.close();
                throw new FileNotFoundException("can not find sheet: " + sheetNameToExtract);
            }
            File outputFile = new File(System.currentTimeMillis() + ".xls");
            FileUtils.createParentDirectories(outputFile);
            try (FileOutputStream stream = new FileOutputStream(outputFile)) {
                workbook.write(stream);
            }
        } finally {
            org.apache.commons.io.IOUtils.closeQuietly(workbook);
        }
    }
```

The result is that the original excel file is 847K and the sheet named "attachment1 [details]" contains only a simple excel, but when it is extracted to a new file, the new file size is 845K, so I guess poi will not delete the irrelevant files from this sheet.
Comment 4 PJ Fanning 2021-11-22 09:47:35 UTC
Maybe you could write custom code to remove the attachments yourself. https://github.com/apache/poi/blob/trunk/poi/src/main/java/org/apache/poi/ss/extractor/EmbeddedExtractor.java will give you an idea how to iterate over the attachments.