Bug 46577 - POI engine logs errors about invalid uri files on certain office 2007 documents
Summary: POI engine logs errors about invalid uri files on certain office 2007 documents
Status: RESOLVED INVALID
Alias: None
Product: POI
Classification: Unclassified
Component: POI Overall (show other bugs)
Version: 3.5-dev
Hardware: PC Windows XP
: P1 critical (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-01-21 11:46 UTC by sreeni
Modified: 2009-04-20 10:47 UTC (History)
0 users



Attachments
PPTX file to be extracted (869.37 KB, application/vnd.openxmlformats-officedocument.presentationml.presentation)
2009-01-21 11:46 UTC, sreeni
Details
docx file to be extracted (46.46 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2009-01-21 11:47 UTC, sreeni
Details

Note You need to log in before you can comment on or make changes to this bug.
Description sreeni 2009-01-21 11:46:41 UTC
Created attachment 23152 [details]
PPTX file to be extracted

Please use the attached office 2007 pptx, docx files, and try to extract the text, you will see some output text like below


127.0.0.1:41812-m9L4iMsg015630] INFO  org.openxml4j.opc  - target contains \
therefore not a valid
URIfile:///C:\Ilias\Projects\MERIT\Beam%20Instrumentation%20and%20Optics\MERIT_OpticsSummary.xlsx
replaced by /
Comment 1 sreeni 2009-01-21 11:47:24 UTC
Created attachment 23153 [details]
docx file to be extracted
Comment 2 Yegor Kozlov 2009-04-20 10:47:59 UTC
I don't see a bug here.

Firstly, both attached files contain absolute references to external resources, for example, 2007_Calendar.docx contains this one:
  <Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/attachedTemplate" Target="file:///C:\Documents%20and%20Settings\Keith%20C.%20Brown\Application%20Data\Microsoft\Templates\2007%20calendar.dotx" TargetMode="External"/>

POI can only process embedded OPC resources.

Secondly, these absolute references are invalid as they contain back slashes while only forward slashes are allowed. POI strictly follows the OPC spec and issues a warning. This is expected behavior.

Yegor