Bug 46577 - POI engine logs errors about invalid uri files on certain office 2007 documents
Summary: POI engine logs errors about invalid uri files on certain office 2007 documents
Alias: None
Product: POI
Classification: Unclassified
Component: POI Overall (show other bugs)
Version: 3.5-dev
Hardware: PC Windows XP
: P1 critical (vote)
Target Milestone: ---
Assignee: POI Developers List
Depends on:
Reported: 2009-01-21 11:46 UTC by sreeni
Modified: 2009-04-20 10:47 UTC (History)
0 users

PPTX file to be extracted (869.37 KB, application/vnd.openxmlformats-officedocument.presentationml.presentation)
2009-01-21 11:46 UTC, sreeni
docx file to be extracted (46.46 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2009-01-21 11:47 UTC, sreeni

Note You need to log in before you can comment on or make changes to this bug.
Description sreeni 2009-01-21 11:46:41 UTC
Created attachment 23152 [details]
PPTX file to be extracted

Please use the attached office 2007 pptx, docx files, and try to extract the text, you will see some output text like below] INFO  org.openxml4j.opc  - target contains \
therefore not a valid
replaced by /
Comment 1 sreeni 2009-01-21 11:47:24 UTC
Created attachment 23153 [details]
docx file to be extracted
Comment 2 Yegor Kozlov 2009-04-20 10:47:59 UTC
I don't see a bug here.

Firstly, both attached files contain absolute references to external resources, for example, 2007_Calendar.docx contains this one:
  <Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/attachedTemplate" Target="file:///C:\Documents%20and%20Settings\Keith%20C.%20Brown\Application%20Data\Microsoft\Templates\2007%20calendar.dotx" TargetMode="External"/>

POI can only process embedded OPC resources.

Secondly, these absolute references are invalid as they contain back slashes while only forward slashes are allowed. POI strictly follows the OPC spec and issues a warning. This is expected behavior.