I am get the exception below thrown when reading in some .xlsm files into POI v3.6 WorkbookFactory.create(fileInputStream). The files open without error in Excel 2007 and OpenOffice 2.3. I have other .xlsm files that work as expected in POI v3.6. The source of the error is in xi/drawings/vmlDrawing1.vml. A user created button has a 2 line title. The lines are separated by a <BR>, thus the reason for the exception. Below is an excerpt from vmlDrawing1.vml <v:shape id="_x0000_s2060" type="#_x0000_t201" style='position:absolute; margin-left:554.25pt;margin-top:78.75pt;width:205.5pt;height:50.25pt; z-index:1;mso-wrap-style:tight' o:button="t" fillcolor="buttonFace [67]" strokecolor="windowText [64]" o:insetmode="auto"> <v:fill color2="buttonFace [67]" o:detectmouseclick="t"/> <o:lock v:ext="edit" rotation="t"/> <v:textbox o:singleclick="f"> <div style='text-align:center'><font face="Arial" size="280" color="10"><b>Print Entire <br> Data Set</b></font></div> </v:textbox> <x:ClientData ObjectType="Button"> <x:Anchor> 10, 8, 5, 1, 13, 66, 7, 36</x:Anchor> <x:PrintObject>False</x:PrintObject> <x:AutoFill>False</x:AutoFill> <x:FmlaMacro>[0]!Module1.print_entire_data_set</x:FmlaMacro> <x:TextHAlign>Center</x:TextHAlign> <x:TextVAlign>Center</x:TextVAlign> </x:ClientData> </v:shape> FYI: I downgrades POI to 3.5-FINAL and the workbook loaded without errors.
Created attachment 25214 [details] Spreadsheet contains one button with a multi-line title
I updated the priority to P1 since the bug is preventing the use of version 3.6 and since the bug is related to the normal use of non-XML compliant HTML tags in the workbook.
The bug is really with Excel here - it has generated a file with invalid XML. The xlsx file is defined as being made up of XML subparts, and the XML spec is very very strict on matching tags. For the long term, you should report a bug to Microsoft about this. They either need to sanitise the user input and sort out the tags (eg <br> becomes <br />), or they need to give up and escape the whole tag contents for the bits where iffy data could get added (eg put this textbox within a CDATA section) Short term, you could just comment out the code that reads in the vmlDrawing section of the file, and ensure that you don't touch the drawing records Medium term, we should get a list of the problem bits that Excel does wrong, such as <br> (but perhaps others). Then, we need to write a XML Input Wrapper that cleans these up before they get passed to the XML Processor for loading. Something like this is quite nasty, though it's possible some other project out there has already done it, and we can just re-use what they do.
EvilUnclosedBRFixingInputStream added in r941399. It's a terrifying sick workaround.... But does allow your file to be loaded Proper fix is to get Microsoft to make Excel output valid xml though!