I have an excel workbook A with 3 worksheets. I need to create 3 additional workbooks each contains a worksheet from workbook A. Because currently HSSF model doesn't allow copy sheets, I thought I could workaround by openning the workbook A, remove 2 worksheets, write it back out as workbook A1, (and keep doing the samething until all workbooks have been produced). Here is my code: public void testRemoveSheets() { String savePath = "C:/temp/Doc/"; String sheetNames[] = {"Sheet1", "Sheet2", "Sheet3"}; try { for (int i = 0; i < sheetNames.length; i++) { String sn = sheetNames[i]; POIFSFileSystem fs = new POIFSFileSystem(new FileInputStream(savePath + "TestRemove2.xls")); HSSFWorkbook wb = new HSSFWorkbook(fs); FileOutputStream fos = new FileOutputStream(savePath + "TestRemove2-" + sn + ".xls"); for (int j = 0; j < sheetNames.length; j++) { String s = sheetNames[j]; if (!s.equals(sn)) wb.removeSheetAt(wb.getSheetIndex(s)); } wb.write(fos); fos.close(); } } catch (Exception e) {System.out.println(e.toString());} } Everything seamed to be OK because I received no error. However, when I tried to open the newly generated workbooks, MS Excel (ver 2002) gave me an error. Here is the error log ************** Microsoft Excel File Repair Log Errors were detected in file 'C:\temp\Doc\TestRemove2-Sheet1.xls' The following is a list of repairs: Removed one or more invalid names. ************** Data seems to be OK but the error is annoying/scary to users. And the file size is always bigger than it would be if generated individually. This bug is similar to bug # 29619 - Excel file becomes corrupt after sheet deletion.
Created attachment 12586 [details] Zip with original workbook - corrupt workbooks - error screenshot
(In reply to comment #0) ... > Data seems to be OK but the error is annoying/scary to users. And the file > size is always bigger than it would be if generated individually. ... I have just worked around this issue in a project I'm working on. I haven't been successful in creating a test case, but still thought I'd describe what I know - hopefully it may help isolate what's causing Excel to display this error. * We have Excel 2000 and Excel 2003 in the office. The "Removed one or more invalid names." error only appears in Excel 2003. However, it appears in 2003 even if the input XLS file was created and saved in 2000. * I deleted all charts and macros from the workbook and re-saved, but the error still occurs. * I worked around the error by making sure never to write to any sheets before removing a sheet that precedes them. So e.g. don't write to Sheet3 and then delete Sheet2; instead, delete Sheet2 and *then* write to any sheets that come after it. However, this factor by itself does not cause this error to be displayed. I'm not sure what else also needs to happen.
note: if someone attaches an affected workbook, I imagine running BiffViewer will show that there are still objects referring to the sheet (look for "Sheet1" or the name of the deleted sheet). If that doesn't pan out run POIFSViewer and there must be embedded objects related to the sheet. If we can figure out what those typically are, we should be able to fix this.
Created attachment 14934 [details] Affected worksheet (almost blank) + BiffViewer output Zip file contains a workbook with most of its content removed, but which still gives the "Removed one or more invalid names" error upon opening in Excel 2003. Also contains BiffViewer dump obtained by calling BiffViewer.run() on the affected workbook.
guys please update how to fix this bug
Created attachment 17395 [details] Exel 2003 get this error while viewing. Used removeSheetAt() method
Any update on how to fix or work around this bug? If more info is still needed, let me know. It's very easy to recreate.
The developers are concentrating on any issues affecting the 3.0 release. The release candidate 4 of POI 3.0 is now available at: http://people.apache.org/~nick/POI-3.0-RC4/ Would you please test if this bug is still an issue.
Thanks for the quick response. I downloaded release candidate 4 of POI 3.0, and I still get the error.
Are you using one of the files attached to this issue, or some other file? If some other file then you should attach the file. Also, are you sure of the provenance of the file - do you know where it came from? What version(s) of Excel, and on operating system(s) All this will help my developer decide if they can do anything quickly, or if there is a way to determine what Excel objects are missed after the removeSheetAt call. I think that there needs to be an exception thrown if there are any dependencies in the rest of the Workbook to the Worksheet that is being removed. If POI throws a meaningful exception then that will allow you to inform your user, etc. I'm at the project management level. I'll see what Yegor thinks, but I can't say when we will have more information. Regards, Dave
Thanks, Dave. I attached a zip containing test code and a test Excel file. I also included the resulting Excel file with the error. I created the attached test Excel file using Excel 2002 SP3 on Windows XP. Let me know what you find. cheers! matt
Created attachment 20149 [details] Zip with test code and Excel files
I can tell that this bug is still present in POI 3.0.1 final with Excel Versions 2003 SP3 (Viewer) and Excel 2007. regards robert
I wonder why this bug has still "needinfo" status. It has been opened three years ago and seems to be well documented. If this issue cannot be reproduced by the developers, please let us know what information is needed.
Any update on fix for this bug?
Unfortunately, to fix this bug, we're going to have to find a needle in a haystack (the record with the reference to the deleted sheet - since poi will open the spreadsheet with the removed sheet, it isn't a simple corruption issue) In order to find our needle, we'll initially need three files: * the original file with all the sheets in it * the file after removing the sheet in poi * the file after removing the sheet in excel Compare the latter two files, see the differences, and in theory you've found the record(s) we need to update. Alas it isn't quite that simple... Next, we have some fun with the fact that saving a file in excel and saving it in poi will give slightly different files. So, we need to open files 1 and 3 in poi, and save those again without changes. Now, find the difference between 1 and 1-from-poi, and 3 and 3-from-poi. Knowing those, you can see what differences there are from just the saving. Finally, find the differences between files 2 and 3, and ignore those differences that can be tracked back to excel save vs poi save. Hopefully that will leave you with only one or two differences, which are your needles. Once we know what those records are, we can work to fix. Anyone affected by this issue fancy spending an hour or two hunting down the needle?
I have some additional information that may help some people with a workaround. I'm still working on generating a neat and tidy testcase but, in the meantime, here's what I know: 1) I have a spreadsheet generated with Excel 2003 on a Windows XP machine. It contains a total of 19 sheets. 2) I read in the 19 sheets and try to delete 12 of them. 3) I experience the "corrupted workbook" error whenever I delete the 9th sheet - REGARDLESS of which sheet it is. In other words, I can delete any 8 sheets but if I delete the 9th, it breaks. 4) By adding an additional sheet to my workbook, my code then failed after deleting 10 worksheets. 5) I was able to delete all 12 of the worksheets I needed by simply adding extra blank worksheets at the end. Therefore, for people trying to solve this via a workaround, you might try adding additional blank worksheets (I also hide them using EXCEL so they don't display.)
I was looking at the BIFF dumps of some of these files that don't open after sheet deletion. I noticed what may be the problem. If you have defined a print area when a sheet is printed, Excel remembers the dimensions of the print area in a NAME record. The NAME record has a 0 based index to the sheet where the area is located. Of course, if you delete a sheet, these indexes must be adjusted for any NAME record referring to a sheet after (with a higher index than) the sheet which was deleted. Apparently, POI does not make this adjustment. So for example, the second attachment to this report shows the BIFF dump for a corrupted file. The file has 5 sheets, so the valid sheet indexes are 0 .. 4 There are 4 NAME records, with sheet indexes 1, 2, 4 and 6. The last one is the culprit (I believe) In the case of the fourth attachment, the file result.xls shows 2 sheets named 'Page 1' and 'Page 3'. So the valid sheet indexes would be 0 or 1 There is 1 NAME record with a sheet index of 3.
(In reply to comment #18) > So for example, the second attachment to this report shows the BIFF dump for a > corrupted file. The file has 5 sheets, so the valid sheet indexes are 0 .. 4 NamedRanges are slightly special though - they reference ExternalSheetIndex not the normal Sheet index/order. These ExternalSheetIndexes are allowed to point to sheets that no longer exist If you create a named range pointing at a sheet, then delete that sheet, excel is quite happy to keep that named range. It will have "#REF" for the sheet part, but otherwise be all there Take a look at src/testcases/org/apache/poi/hssf/data/30978-deleted.xls - this has a named range pointing to a deleted sheet, created by excel So, I think this isn't necesserily the problem
(In reply to comment #19) That's not what I see The *formula* for the named range might refer to deleted sheets. But the *sheet index* field of the BIFF (a short integer at offset 8-9), I believe, must refer to an existing sheet. When I look at 30978-deleted.xls, I find three NAME records, for user defined ranges "OnOne", "On2" and "OnThree". Apparently "On2" refers to the deleted Sheet2 somehow. I think the 'formula' part of the NAME record still refers to the deleted sheet. But the Sheet Index field of the record has been adjusted. All three NAME records in 30978-deleted.xls have a Sheet Index of 0. Since there are two sheets in the workbook, the allowed range for the sheet index value should be 0..1, and these NAME records meet that criterion. I believe, to fix this bug, the 'sheet index' field of all NAME records must be appropriately adjusted, and NAME records which belong to a sheet which is deleted must also be removed. Side Note: Even if you do this, deleting a sheet with a macro will still not work, because there is yet another structure that exposes sheet names as property names to the macro object 'ThisWorkbook'. Macros are not involved in this bug, but the presence of macros will cause deleteSheet() to corrupt the workbook.(In reply to comment #19)
Ah, done some more digging and I see what you mean about the sheet index. It can be zero (my test file only had zero), but if it's non-zero then it's a 1-based index to the sheet I've committed a patch to POI to fix up the 1-based sheet indexes as needed when deleting the sheet. However, excel still gives an error about names when opening I think the issue is down to us changing the Ptg when we write the name out. There's a disabled test in svn (usermodel/TestBugs.java#BROKENtest30978) Josh - any ideas why the Ptg is being changed?
Fixed in svn r669809 Ptg class hierarchy needed adjustment for DeletedRef3DPtg and DeletedArea3DPtg. These are not true sub-classes of Ref3DPtg and Area3DPtg. Similar to the fix for bug 45091.
To confirm - I have re-run the test from Matt Hill (attachment 20149 [details]) now that Josh's fix is in. Excel 2003 has no issues when opening the output result.xls, so it looks like we've finally got this one fixed :)