(using 3.10.1) With the introduction of the entity expansion limit, I am seeing the following output in STDERR (System.err): [Fatal Error] :1:1: JAXP00010001: The parser has encountered more than "4096" entity expansions in this document; this is the limit imposed by the JDK. This message should not appear in the System.err. It should either be handled via an ErrorHandler, or be logged through a logging framework. As we can see from the stack trace, the culprit is in org.apache.poi.xssf.eventusermodel.ReadOnlySharedStringsTable, which does not initialize an error handler. It would probably be sufficient to call sheetParser.setErrorHandler(this) and let the calling library handle this case by extending ReadOnlySharedStringsTable (it already extends DefaultHandler). at java.io.FilterOutputStream.write(FilterOutputStream.java:125) at java.io.PrintStream.write(PrintStream.java:480) at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291) at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:295) at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:141) at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229) at java.io.BufferedWriter.flush(BufferedWriter.java:254) at java.io.PrintWriter.flush(PrintWriter.java:320) at com.sun.org.apache.xerces.internal.util.DefaultErrorHandler.printError(DefaultErrorHandler.java:112) at com.sun.org.apache.xerces.internal.util.DefaultErrorHandler.fatalError(DefaultErrorHandler.java:84) at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:441) at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:368) at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:325) at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(XMLEntityManager.java:1302) at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(XMLEntityManager.java:1227) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEntityReference(XMLDocumentFragmentScannerImpl.java:1907) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:3051) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606) at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:117) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:510) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:848) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:777) at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141) at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213) at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:648) at org.apache.poi.xssf.eventusermodel.ReadOnlySharedStringsTable.readFrom(ReadOnlySharedStringsTable.java:140) at org.apache.poi.xssf.eventusermodel.ReadOnlySharedStringsTable.<init>(ReadOnlySharedStringsTable.java:111) at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.buildXHTML(XSSFExcelExtractorDecorator.java:100) at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:104) at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:112) at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:82)
I'll leave one of our xml experts to decide on the best way to handle this However... Shared Strings tables are workbook wide, and can be read independently of any given sheet. As such, there's often no sheet object on hand to report an error through, so I'm not sure your suggested solution will work
In Solr, we have http://lucene.apache.org/solr/4_9_0/solr-solrj/org/apache/solr/common/util/XMLErrorLogger.html This class is set as ErrorHandler when we setup the XMLReader.
> As such, there's often no sheet object on hand to report an error through, so I'm not sure your suggested solution will work But we can use the official logging framework of POI, like in Solr. But as this is a fatal error, you will get the Exception explaining the parse problem in your calling code, so where is the problem?
I tried to reproduce this with the latest 4.0.0-SNAPSHOT version but could not, maybe some of the changes that we did over time already fixed this. If you still see this, please provide a small unit-test which shows the problem so we can more easily reproduce and fix it.