Bug 65683

Summary: call workbook.removeSheetAt will not remove attachments
Product: POI Reporter: 1120955357
Component: XSSFAssignee: POI Developers List <dev>
Status: NEW ---    
Severity: normal CC: 1120955357
Priority: P2    
Version: 4.1.2-FINAL   
Target Milestone: ---   
Hardware: PC   
OS: All   
Attachments: this is origin excel template
this is extract whole sheet named "attachment1" file from the origin excel

Description 1120955357 2021-11-16 08:39:46 UTC
I have an Excel file that contains several sheet pages, each containing one or more attachments, and I want to extract one of the sheet pages and save it to a new workbook.

My approach is to remove the rest of the sheet pages from the original document and save them to a new file. This will achieve what I need, but the extracted file is basically the same size as the original file, and I want to be able to keep only the files in the sheet I need in the new workbook.

Can anyone help me?
Comment 1 Dominik Stadler 2021-11-21 17:26:31 UTC
Can you attach a sample workbook and some code for a reproducible test-case which shows your case and allows others to try to help?
Comment 2 1120955357 2021-11-22 01:50:36 UTC
Created attachment 38095 [details]
this is origin excel template
Comment 3 1120955357 2021-11-22 02:01:26 UTC
Created attachment 38096 [details]
this is extract whole sheet named "attachment1 [details]" file from the origin excel

this is test case.

```
@Test
    public void testExtractSheet() throws IOException {
        Workbook workbook = WorkbookFactory.create(new File("template - 副本.xls"));
        try {
            int numberOfSheets = workbook.getNumberOfSheets();
            boolean found = false;
            String sheetNameToExtract = "attachment1 [details]";

            for (int i = 0; i < numberOfSheets; i++) {
                Sheet sheetAt = workbook.getSheetAt(i);
                if (!sheetAt.getSheetName().equalsIgnoreCase(sheetNameToExtract)) {
                    workbook.removeSheetAt(i--);
                    numberOfSheets--;
                } else {
                    found = true;
                }
            }
            if (!found) {
                workbook.close();
                throw new FileNotFoundException("can not find sheet: " + sheetNameToExtract);
            }
            File outputFile = new File(System.currentTimeMillis() + ".xls");
            FileUtils.createParentDirectories(outputFile);
            try (FileOutputStream stream = new FileOutputStream(outputFile)) {
                workbook.write(stream);
            }
        } finally {
            org.apache.commons.io.IOUtils.closeQuietly(workbook);
        }
    }
```

The result is that the original excel file is 847K and the sheet named "attachment1 [details]" contains only a simple excel, but when it is extracted to a new file, the new file size is 845K, so I guess poi will not delete the irrelevant files from this sheet.
Comment 4 PJ Fanning 2021-11-22 09:47:35 UTC
Maybe you could write custom code to remove the attachments yourself. https://github.com/apache/poi/blob/trunk/poi/src/main/java/org/apache/poi/ss/extractor/EmbeddedExtractor.java will give you an idea how to iterate over the attachments.