Bug 59183 - New exception parsing dates with timezone offsets in OPC with POI 3.14
Summary: New exception parsing dates with timezone offsets in OPC with POI 3.14
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: OPC (show other bugs)
Version: unspecified
Hardware: PC All
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
: 59204 (view as bug list)
Depends on:
Blocks:
 
Reported: 2016-03-15 13:02 UTC by Tim Allison
Modified: 2016-03-21 14:38 UTC (History)
1 user (show)



Attachments
Smallest triggering file (6.36 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2016-03-15 13:02 UTC, Tim Allison
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Tim Allison 2016-03-15 13:02:20 UTC
Created attachment 33675 [details]
Smallest triggering file

Thanks to Dominik's common crawl download tool, we now have many, many more ooxml files for testing in Tika's regression corpus.

We're now getting the following exception in roughly 40 files with POI 3.14.

I regret that I should have caught this before the release!


java.lang.IllegalArgumentException: Date for created could not be parsed: 2012-05-21T12:56:36+02:00
	at org.apache.poi.openxml4j.opc.internal.PackagePropertiesPart.setCreatedProperty(PackagePropertiesPart.java:393)
	at org.apache.poi.openxml4j.opc.internal.unmarshallers.PackagePropertiesUnmarshaller.unmarshall(PackagePropertiesUnmarshaller.java:124)
	at org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:726)
	at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:230)
	at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:69)

...
Caused by: org.apache.poi.openxml4j.exceptions.InvalidFormatException: Date 2012-05-21T12:56:36+02:00Z not well formated, expected format yyyy-MM-dd'T'HH:mm:ss'Z' or yyyy-MM-dd'T'HH:mm:ss.SS'Z'
	at org.apache.poi.openxml4j.opc.internal.PackagePropertiesPart.setDateValue(PackagePropertiesPart.java:575)
	at org.apache.poi.openxml4j.opc.internal.PackagePropertiesPart.setCreatedProperty(PackagePropertiesPart.java:391)
	... 22 more
Comment 1 Tim Allison 2016-03-16 17:55:50 UTC
r1735270

More code than I would have liked.  We should be able to simplify when we move to Java 7 with "XXX" option to handle timezones with colons.  Might want to move to ThreadLocal static SimpleDateFormat, but I doubt that will buy us much...
Comment 2 Dominik Stadler 2016-03-21 14:38:31 UTC
*** Bug 59204 has been marked as a duplicate of this bug. ***