Hi, 1. we're using single core Solr 6.4 instance on windows server (windows server 2012 R2 standard) 2. Java v8, (build 1.8.0_121-b13) 3. ooxml-schemas-1.3.jar, poi-3.15.jar, poi-ooxml-3.15.jar, poi-scratchpad-3.15.jar But still we have some solrexeptions/errors for ~2000 vsdx files. It is critical to us have them indexed. Any solutions from you are welcome. for most of them I see this error/exception: org.apache.poi.POIXMLException: Invalid 'Row_Type' name 'PolylineTo' For example: { "responseHeader": { "status": 500, "QTime": 65 }, "error": { "msg": "org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.ooxml.OOXMLParser@3c9f695c", "code": 500, "trace": "org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.ooxml.OOXMLParser@3c9f695c\r\n\tat org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:234)\r\n\tat org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)\r\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:166)\r\n\tat org.apache.solr.core.SolrCore.execute(SolrCore.java:2306)\r\n\tat org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:658)\r\n\tat org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464)\r\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)\r\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:296)\r\n\tat org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)\r\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)\r\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\r\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)\r\n\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)\r\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)\r\n\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)\r\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\r\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)\r\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\r\n\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)\r\n\tat org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)\r\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\r\n\tat org.eclipse.jetty.server.Server.handle(Server.java:534)\r\n\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)\r\n\tat org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)\r\n\tat org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)\r\n\tat org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)\r\n\tat org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)\r\n\tat org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)\r\n\tat org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)\r\n\tat org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)\r\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)\r\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)\r\n\tat java.lang.Thread.run(Unknown Source)\r\nCaused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.ooxml.OOXMLParser@3c9f695c\r\n\tat org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)\r\n\tat org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)\r\n\tat org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)\r\n\tat org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:228)\r\n\t... 32 more\r\nCaused by: org.apache.poi.POIXMLException: /visio/masters/masters.xml: /visio/masters/master11.xml: <Shape ID=\"11\">: Invalid 'Row_Type' name 'PolylineTo'\r\n\tat org.apache.poi.xdgf.exceptions.XDGFException.wrap(XDGFException.java:43)\r\n\tat org.apache.poi.xdgf.usermodel.XDGFMasters.onDocumentRead(XDGFMasters.java:107)\r\n\tat org.apache.poi.xdgf.usermodel.XmlVisioDocument.onDocumentRead(XmlVisioDocument.java:106)\r\n\tat org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:190)\r\n\tat org.apache.poi.xdgf.usermodel.XmlVisioDocument.<init>(XmlVisioDocument.java:79)\r\n\tat org.apache.poi.xdgf.extractor.XDGFVisioExtractor.<init>(XDGFVisioExtractor.java:41)\r\n\tat org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:207)\r\n\tat org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:86)\r\n\tat org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:87)\r\n\tat org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)\r\n\t... 35 more\r\nCaused by: org.apache.poi.POIXMLException: Invalid 'Row_Type' name 'PolylineTo'\r\n\tat org.apache.poi.xdgf.util.ObjectFactory.load(ObjectFactory.java:45)\r\n\tat org.apache.poi.xdgf.usermodel.section.geometry.GeometryRowFactory.load(GeometryRowFactory.java:58)\r\n\tat org.apache.poi.xdgf.usermodel.section.GeometrySection.<init>(GeometrySection.java:55)\r\n\tat org.apache.poi.xdgf.usermodel.XDGFSheet.<init>(XDGFSheet.java:77)\r\n\tat org.apache.poi.xdgf.usermodel.XDGFShape.<init>(XDGFShape.java:113)\r\n\tat org.apache.poi.xdgf.usermodel.XDGFShape.<init>(XDGFShape.java:125)\r\n\tat org.apache.poi.xdgf.usermodel.XDGFShape.<init>(XDGFShape.java:125)\r\n\tat org.apache.poi.xdgf.usermodel.XDGFShape.<init>(XDGFShape.java:125)\r\n\tat org.apache.poi.xdgf.usermodel.XDGFShape.<init>(XDGFShape.java:107)\r\n\tat org.apache.poi.xdgf.usermodel.XDGFBaseContents.onDocumentRead(XDGFBaseContents.java:82)\r\n\tat org.apache.poi.xdgf.usermodel.XDGFMasterContents.onDocumentRead(XDGFMasterContents.java:66)\r\n\tat org.apache.poi.xdgf.usermodel.XDGFMasters.onDocumentRead(XDGFMasters.java:101)\r\n\t... 43 more\r\n", "metadata": [ "error-class", "org.apache.solr.common.SolrException", "root-error-class", "org.apache.poi.POIXMLException" ] } }
Can you please attach a small problematic file to this bugzilla ticket, so we can take a look?
Created attachment 34906 [details] visio file example
please see an attached visio file.
Thanks for that, failing (disabled) unit test added in r1791098. It looks like someone will need to look up the specs on PolylineTo row type, then implement a rowtype class for that. We have a number of other *To row type classes implemented, so hopefully not too much work once someone finds the magic bit in the spec which details how these should work!
When could we expect for new release with this "fix" implemented? Maybe there is some possibility to get some beta version earlier.
A nightly build would be available the day after some kind person implements it. A full release is usually done a few times a year We're all volunteers here! If this bug matters to you, please help by digging through the public documentation and helping work out what this new row type needs to do/implement/etc!
Actually, this turned out to be easier than expected and no spec reading was required - there seems to be two varients, PolylineTo and PolyLineTo - note the L can be l or L. In r1791108 the other form has been added as an alias. For future XDGF missing feature reference, the public docs on the VSDX file format are linked from http://poi.apache.org/guidelines.html#FileFormatInformation