Summary: | ArrayIndexOutOfBoundsException when processing certain .doc files | ||
---|---|---|---|
Product: | POI | Reporter: | Advokat <mgr> |
Component: | POI Overall | Assignee: | POI Developers List <dev> |
Status: | RESOLVED FIXED | ||
Severity: | normal | ||
Priority: | P2 | ||
Version: | 3.17-FINAL | ||
Target Milestone: | --- | ||
Hardware: | PC | ||
OS: | All | ||
Attachments: | This File should reproduce the issue |
Created attachment 35616 [details] This File should reproduce the issue When Solr (7.1.0) is trying to parse this .doc file we get following exception: Seems to be related to an older form of .doc files because converting the .doc to a .docx and then back to a .doc fixes this issue. { "responseHeader":{ "status":500, "QTime":265}, "error":{ "metadata":[ "error-class","org.apache.solr.common.SolrException", "root-error-class","java.lang.ArrayIndexOutOfBoundsException"], "msg":"org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@20395b83", "trace":"org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@20395b83\r\n\tat org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:234)\r\n\tat org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)\r\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:177)\r\n\tat org.apache.solr.core.SolrCore.execute(SolrCore.java:2484)\r\n\tat org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:720)\r\n\tat org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:526)\r\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:382)\r\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:326)\r\n\tat org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1751)\r\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)\r\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\r\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\r\n\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)\r\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)\r\n\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)\r\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\r\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)\r\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\r\n\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)\r\n\tat org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)\r\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\r\n\tat org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\r\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\r\n\tat org.eclipse.jetty.server.Server.handle(Server.java:534)\r\n\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)\r\n\tat org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)\r\n\tat org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)\r\n\tat org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)\r\n\tat org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)\r\n\tat org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)\r\n\tat org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)\r\n\tat org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)\r\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)\r\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)\r\n\tat java.lang.Thread.run(Unknown Source)\r\nCaused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@20395b83\r\n\tat org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)\r\n\tat org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)\r\n\tat org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)\r\n\tat org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)\r\n\tat org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:228)\r\n\t... 34 more\r\nCaused by: java.lang.ArrayIndexOutOfBoundsException: -1\r\n\tat org.apache.poi.hwpf.model.StyleSheet.getCharacterStyle(StyleSheet.java:329)\r\n\tat org.apache.poi.hwpf.model.CHPX.getCharacterProperties(CHPX.java:74)\r\n\tat org.apache.poi.hwpf.usermodel.CharacterRun.<init>(CharacterRun.java:100)\r\n\tat org.apache.poi.hwpf.usermodel.Range.getCharacterRun(Range.java:727)\r\n\tat org.apache.poi.hwpf.model.PicturesTable.getAllPictures(PicturesTable.java:227)\r\n\tat org.apache.tika.parser.microsoft.WordExtractor$PicturesSource.<init>(WordExtractor.java:712)\r\n\tat org.apache.tika.parser.microsoft.WordExtractor$PicturesSource.<init>(WordExtractor.java:702)\r\n\tat org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:174)\r\n\tat org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:175)\r\n\tat org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:131)\r\n\tat org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)\r\n\t... 38 more\r\n", "code":500}}