Bug 61911 - ArrayIndexOutOfBoundsException when processing certain .doc files
Summary: ArrayIndexOutOfBoundsException when processing certain .doc files
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: POI Overall (show other bugs)
Version: 3.17-FINAL
Hardware: PC All
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-12-15 15:28 UTC by Advokat
Modified: 2017-12-28 08:46 UTC (History)
0 users



Attachments
This File should reproduce the issue (27.50 KB, application/msword)
2017-12-15 15:28 UTC, Advokat
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Advokat 2017-12-15 15:28:22 UTC
Created attachment 35616 [details]
This File should reproduce the issue

When Solr (7.1.0) is trying to parse this .doc file we get following exception:
Seems to be related to an older form of .doc files because converting the .doc to a .docx and then back to a .doc fixes this issue.

{
  "responseHeader":{
    "status":500,
    "QTime":265},
  "error":{
    "metadata":[
      "error-class","org.apache.solr.common.SolrException",
      "root-error-class","java.lang.ArrayIndexOutOfBoundsException"],
    "msg":"org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@20395b83",
    "trace":"org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@20395b83\r\n\tat org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:234)\r\n\tat org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)\r\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:177)\r\n\tat org.apache.solr.core.SolrCore.execute(SolrCore.java:2484)\r\n\tat org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:720)\r\n\tat org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:526)\r\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:382)\r\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:326)\r\n\tat org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1751)\r\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)\r\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\r\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\r\n\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)\r\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)\r\n\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)\r\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\r\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)\r\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\r\n\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)\r\n\tat org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)\r\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\r\n\tat org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\r\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\r\n\tat org.eclipse.jetty.server.Server.handle(Server.java:534)\r\n\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)\r\n\tat org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)\r\n\tat org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)\r\n\tat org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)\r\n\tat org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)\r\n\tat org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)\r\n\tat org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)\r\n\tat org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)\r\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)\r\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)\r\n\tat java.lang.Thread.run(Unknown Source)\r\nCaused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@20395b83\r\n\tat org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)\r\n\tat org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)\r\n\tat org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)\r\n\tat org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)\r\n\tat org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:228)\r\n\t... 34 more\r\nCaused by: java.lang.ArrayIndexOutOfBoundsException: -1\r\n\tat org.apache.poi.hwpf.model.StyleSheet.getCharacterStyle(StyleSheet.java:329)\r\n\tat org.apache.poi.hwpf.model.CHPX.getCharacterProperties(CHPX.java:74)\r\n\tat org.apache.poi.hwpf.usermodel.CharacterRun.<init>(CharacterRun.java:100)\r\n\tat org.apache.poi.hwpf.usermodel.Range.getCharacterRun(Range.java:727)\r\n\tat org.apache.poi.hwpf.model.PicturesTable.getAllPictures(PicturesTable.java:227)\r\n\tat org.apache.tika.parser.microsoft.WordExtractor$PicturesSource.<init>(WordExtractor.java:712)\r\n\tat org.apache.tika.parser.microsoft.WordExtractor$PicturesSource.<init>(WordExtractor.java:702)\r\n\tat org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:174)\r\n\tat org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:175)\r\n\tat org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:131)\r\n\tat org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)\r\n\t... 38 more\r\n",
    "code":500}}
Comment 1 Dominik Stadler 2017-12-28 08:46:28 UTC
Fixed with r1819403 by adding more checks for invalid indices.