Bug 60826 - Add initial support for SAX-like read-only .xlsb parser
Summary: Add initial support for SAX-like read-only .xlsb parser
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: XSSF (show other bugs)
Version: 3.16-dev
Hardware: PC All
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-03-06 20:56 UTC by Tim Allison
Modified: 2017-03-17 10:11 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Tim Allison 2017-03-06 20:56:29 UTC
On TIKA-1195, we have a request for handling xlsb formats. I'm opening this issue to track work towards this goal. 

As I find time, I'll work on this: https://github.com/tballison/poi/tree/xlsb

If someone wants to beat me to it, don't let me stop you.
Comment 1 Tim Allison 2017-03-10 18:22:00 UTC
If anyone has a chance to review the xlsb parser, I'd greatly appreciate it.

I think that it is close to being ready to be committed.

It duplicates more than I'd like, but we can refactor in the future.  I couldn't get around some of the bean-based structures, or I had to copy+paste.  Ugh.

 https://github.com/tballison/poi/tree/xlsb

Unless there are objections, I'll plan to commit this towards the middle of this coming week.
Comment 2 Tim Allison 2017-03-16 18:39:04 UTC
r1787228

Let me know if there is anything we need to change API-wise before the next release.

Any other feedback is welcomed, too.
Comment 3 Tim Allison 2017-03-16 22:33:51 UTC
Broke the build because I failed to update stress tests.

Should be fixed r1787264.
Comment 4 Javen O'Neal 2017-03-16 23:40:17 UTC
Sorry for the late review.

Global: Replace 3.15-beta3 with 3.16-beta3
XSSFBHeaderFooter: headerFooterHelper should either be a static final member, or even better, a non-instantiable (static) class with static methods.
Docs: use https URLs when possible
Docs: spell out full org.apache.poi... rather than abbreviating to o.a.p
XSSFBRelation: can getContents be moved up to a superclass (POIXMLRelation)? There isn't any logic here that's specific to XSSFB.
XSSFBRichStr: we should try to make this implement the RichString interface later.
XSSFBRecordType: lookup can have constant lookup time if we use a Map at the expense of some memory.
Comment 5 Javen O'Neal 2017-03-17 04:34:46 UTC
XSSFBRichTextString: add @org.apache.poi.util.NotImplemented annotation to stubbed functions that haven't been implemented yet.
Global: add @since 3.16-beta3 javadoc to all new classes
XSSFBCellHeader: formatAddressAsString can be implemented using new CellAddress(int row, int col).formatAsString().
XSSFBSharedStringsTable: change constructor Javadoc to @since 3.16-beta3.
XSSFBSharedStringsTable: should getItems return an unmodifiableList or a copy of the list?
XSSFBCellRange: "public static final int length" is the standard modifier order. See http://stackoverflow.com/a/10299123/2683399 for order
XSSFBCellRange: this should have tighter integration with o.a.p.ss.util.AreaReference in the future, but AreaReference probably needs some cleanup first.
XSSFBCommentsTable: should the natural ordering of CellAddresses be implemented into the CellAddress class itself instead as CellAddressComparator inner class? Or perhaps as a standalone class if both row-major and column-major ordering is needed.
XSSFBCommentsTable: add a comment to CellAddressComparator that this implementation is row-major ordering
XSSFEventBasedExcelExtractor: any reason for elevating visibility from private? Was this just a quick fix to get the code to work? Is the performance acceptable if you add accessors to private members? If not, does protected work?
XSSFBEventBasedExcelExtractor: should we log caught and suppressed exceptions to the POILogger rather than stderr?
TestExtractorFactory: replace assertTrue(String.contains(String)) with POITestCase.assertContains(String haystack, String needle)
TestXSSFBReader: import assertContains from POITestCase rather than redefining.
TestXSSFBEventBasedExcelExtractor: testShapes uses String.indexOf(String) > -1. Consider using or adding something to POITestCase to make the purpose clearer.
Comment 6 Tim Allison 2017-03-17 08:26:22 UTC
Clean up per Javen's feedback.
Comment 7 Tim Allison 2017-03-17 10:11:39 UTC
Javen, 
  Many, many thanks for your careful code review!  Changes made r1787320.
Let me know if you see anything else worth changing.

Details:

Global: Replace 3.15-beta3 with 3.16-beta3
	>Done

XSSFBHeaderFooter: headerFooterHelper should either be a static final member, or even better, a non-instantiable (static) class with static methods.
	>Changed to static final.  Y, we should make that a non-instantiable static class.  I was working with what was already in POI.

Docs: use https URLs when possible 
	>I found one in package.html, should I also change them in the license headers?

Docs: spell out full org.apache.poi... rather than abbreviating to o.a.p
	>Done.

XSSFBRelation: can getContents be moved up to a superclass (POIXMLRelation)? There isn't any logic here that's specific to XSSFB.
	>Done, and removed from XSSFRelation

XSSFBRichStr: we should try to make this implement the RichString interface later.
	>Agreed.

XSSFBRecordType: lookup can have constant lookup time if we use a Map at the expense of some memory.
	>Done.

XSSFBRichTextString: add @org.apache.poi.util.NotImplemented annotation to stubbed functions that haven't been implemented yet.
	>Done.

Global: add @since 3.16-beta3 javadoc to all new classes
	>Done.

XSSFBCellHeader: formatAddressAsString can be implemented using new CellAddress(int row, int col).formatAsString().
	>As part of premature optimization, I was trying to cut down on new objects.  Turns out that was vestigial to an earlier draft, and I've removed it.

XSSFBSharedStringsTable: change constructor Javadoc to @since 3.16-beta3.
	>deleted. Relying on @since 3.16-beta3 at class level.
	
XSSFBSharedStringsTable: should getItems return an unmodifiableList or a copy of the list?
	>Yes...always the tradeoff of security vs efficiency.  I've modified it to make a copy.

XSSFBCellRange: "public static final int length" is the standard modifier order. See http://stackoverflow.com/a/10299123/2683399 for order
	>Did a global replace in xssf.  I suspect we could do this across the project. 

XSSFBCellRange: this should have tighter integration with o.a.p.ss.util.AreaReference in the future, but AreaReference probably needs some cleanup first.
	>Agreed.

XSSFBCommentsTable: should the natural ordering of CellAddresses be implemented into the CellAddress class itself instead as CellAddressComparator inner class? Or perhaps as a standalone class if both row-major and column-major ordering is needed.
	>Duh.  CellAddress already naturally sorts row-major.  I removed the CellAddressComparator in XSSFBCommentsTable. 

XSSFEventBasedExcelExtractor: any reason for elevating visibility from private? Was this just a quick fix to get the code to work? Is the performance acceptable if you add accessors to private members? If not, does protected work?
	>All back to private with access via getters.

XSSFBEventBasedExcelExtractor: should we log caught and suppressed exceptions to the POILogger rather than stderr?
	>Done in both XSSFBEvent... and XSSFEvent...

TestExtractorFactory: replace assertTrue(String.contains(String)) with POITestCase.assertContains(String haystack, String needle)
	>Done throughout TestExtractorFactory

TestXSSFBReader: import assertContains from POITestCase rather than redefining.
	>Gah.  Thank you.  Done.

TestXSSFBEventBasedExcelExtractor: testShapes uses String.indexOf(String) > -1. Consider using or adding something to POITestCase to make the purpose clearer
	>Used POITestCase and fixed TestXSSFEventBasedExcelExtractor throughout as well.