On TIKA-1195, we have a request for handling xlsb formats. I'm opening this issue to track work towards this goal. As I find time, I'll work on this: https://github.com/tballison/poi/tree/xlsb If someone wants to beat me to it, don't let me stop you.
If anyone has a chance to review the xlsb parser, I'd greatly appreciate it. I think that it is close to being ready to be committed. It duplicates more than I'd like, but we can refactor in the future. I couldn't get around some of the bean-based structures, or I had to copy+paste. Ugh. https://github.com/tballison/poi/tree/xlsb Unless there are objections, I'll plan to commit this towards the middle of this coming week.
r1787228 Let me know if there is anything we need to change API-wise before the next release. Any other feedback is welcomed, too.
Broke the build because I failed to update stress tests. Should be fixed r1787264.
Sorry for the late review. Global: Replace 3.15-beta3 with 3.16-beta3 XSSFBHeaderFooter: headerFooterHelper should either be a static final member, or even better, a non-instantiable (static) class with static methods. Docs: use https URLs when possible Docs: spell out full org.apache.poi... rather than abbreviating to o.a.p XSSFBRelation: can getContents be moved up to a superclass (POIXMLRelation)? There isn't any logic here that's specific to XSSFB. XSSFBRichStr: we should try to make this implement the RichString interface later. XSSFBRecordType: lookup can have constant lookup time if we use a Map at the expense of some memory.
XSSFBRichTextString: add @org.apache.poi.util.NotImplemented annotation to stubbed functions that haven't been implemented yet. Global: add @since 3.16-beta3 javadoc to all new classes XSSFBCellHeader: formatAddressAsString can be implemented using new CellAddress(int row, int col).formatAsString(). XSSFBSharedStringsTable: change constructor Javadoc to @since 3.16-beta3. XSSFBSharedStringsTable: should getItems return an unmodifiableList or a copy of the list? XSSFBCellRange: "public static final int length" is the standard modifier order. See http://stackoverflow.com/a/10299123/2683399 for order XSSFBCellRange: this should have tighter integration with o.a.p.ss.util.AreaReference in the future, but AreaReference probably needs some cleanup first. XSSFBCommentsTable: should the natural ordering of CellAddresses be implemented into the CellAddress class itself instead as CellAddressComparator inner class? Or perhaps as a standalone class if both row-major and column-major ordering is needed. XSSFBCommentsTable: add a comment to CellAddressComparator that this implementation is row-major ordering XSSFEventBasedExcelExtractor: any reason for elevating visibility from private? Was this just a quick fix to get the code to work? Is the performance acceptable if you add accessors to private members? If not, does protected work? XSSFBEventBasedExcelExtractor: should we log caught and suppressed exceptions to the POILogger rather than stderr? TestExtractorFactory: replace assertTrue(String.contains(String)) with POITestCase.assertContains(String haystack, String needle) TestXSSFBReader: import assertContains from POITestCase rather than redefining. TestXSSFBEventBasedExcelExtractor: testShapes uses String.indexOf(String) > -1. Consider using or adding something to POITestCase to make the purpose clearer.
Clean up per Javen's feedback.
Javen, Many, many thanks for your careful code review! Changes made r1787320. Let me know if you see anything else worth changing. Details: Global: Replace 3.15-beta3 with 3.16-beta3 >Done XSSFBHeaderFooter: headerFooterHelper should either be a static final member, or even better, a non-instantiable (static) class with static methods. >Changed to static final. Y, we should make that a non-instantiable static class. I was working with what was already in POI. Docs: use https URLs when possible >I found one in package.html, should I also change them in the license headers? Docs: spell out full org.apache.poi... rather than abbreviating to o.a.p >Done. XSSFBRelation: can getContents be moved up to a superclass (POIXMLRelation)? There isn't any logic here that's specific to XSSFB. >Done, and removed from XSSFRelation XSSFBRichStr: we should try to make this implement the RichString interface later. >Agreed. XSSFBRecordType: lookup can have constant lookup time if we use a Map at the expense of some memory. >Done. XSSFBRichTextString: add @org.apache.poi.util.NotImplemented annotation to stubbed functions that haven't been implemented yet. >Done. Global: add @since 3.16-beta3 javadoc to all new classes >Done. XSSFBCellHeader: formatAddressAsString can be implemented using new CellAddress(int row, int col).formatAsString(). >As part of premature optimization, I was trying to cut down on new objects. Turns out that was vestigial to an earlier draft, and I've removed it. XSSFBSharedStringsTable: change constructor Javadoc to @since 3.16-beta3. >deleted. Relying on @since 3.16-beta3 at class level. XSSFBSharedStringsTable: should getItems return an unmodifiableList or a copy of the list? >Yes...always the tradeoff of security vs efficiency. I've modified it to make a copy. XSSFBCellRange: "public static final int length" is the standard modifier order. See http://stackoverflow.com/a/10299123/2683399 for order >Did a global replace in xssf. I suspect we could do this across the project. XSSFBCellRange: this should have tighter integration with o.a.p.ss.util.AreaReference in the future, but AreaReference probably needs some cleanup first. >Agreed. XSSFBCommentsTable: should the natural ordering of CellAddresses be implemented into the CellAddress class itself instead as CellAddressComparator inner class? Or perhaps as a standalone class if both row-major and column-major ordering is needed. >Duh. CellAddress already naturally sorts row-major. I removed the CellAddressComparator in XSSFBCommentsTable. XSSFEventBasedExcelExtractor: any reason for elevating visibility from private? Was this just a quick fix to get the code to work? Is the performance acceptable if you add accessors to private members? If not, does protected work? >All back to private with access via getters. XSSFBEventBasedExcelExtractor: should we log caught and suppressed exceptions to the POILogger rather than stderr? >Done in both XSSFBEvent... and XSSFEvent... TestExtractorFactory: replace assertTrue(String.contains(String)) with POITestCase.assertContains(String haystack, String needle) >Done throughout TestExtractorFactory TestXSSFBReader: import assertContains from POITestCase rather than redefining. >Gah. Thank you. Done. TestXSSFBEventBasedExcelExtractor: testShapes uses String.indexOf(String) > -1. Consider using or adding something to POITestCase to make the purpose clearer >Used POITestCase and fixed TestXSSFEventBasedExcelExtractor throughout as well.