Bug 59183

Summary: New exception parsing dates with timezone offsets in OPC with POI 3.14
Product: POI Reporter: Tim Allison <tallison>
Component: OPCAssignee: POI Developers List <dev>
Severity: normal CC: istvan.foldhazi
Priority: P2    
Version: unspecified   
Target Milestone: ---   
Hardware: PC   
OS: All   
Attachments: Smallest triggering file

Description Tim Allison 2016-03-15 13:02:20 UTC
Created attachment 33675 [details]
Smallest triggering file

Thanks to Dominik's common crawl download tool, we now have many, many more ooxml files for testing in Tika's regression corpus.

We're now getting the following exception in roughly 40 files with POI 3.14.

I regret that I should have caught this before the release!

java.lang.IllegalArgumentException: Date for created could not be parsed: 2012-05-21T12:56:36+02:00
	at org.apache.poi.openxml4j.opc.internal.PackagePropertiesPart.setCreatedProperty(PackagePropertiesPart.java:393)
	at org.apache.poi.openxml4j.opc.internal.unmarshallers.PackagePropertiesUnmarshaller.unmarshall(PackagePropertiesUnmarshaller.java:124)
	at org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:726)
	at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:230)
	at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:69)

Caused by: org.apache.poi.openxml4j.exceptions.InvalidFormatException: Date 2012-05-21T12:56:36+02:00Z not well formated, expected format yyyy-MM-dd'T'HH:mm:ss'Z' or yyyy-MM-dd'T'HH:mm:ss.SS'Z'
	at org.apache.poi.openxml4j.opc.internal.PackagePropertiesPart.setDateValue(PackagePropertiesPart.java:575)
	at org.apache.poi.openxml4j.opc.internal.PackagePropertiesPart.setCreatedProperty(PackagePropertiesPart.java:391)
	... 22 more
Comment 1 Tim Allison 2016-03-16 17:55:50 UTC

More code than I would have liked.  We should be able to simplify when we move to Java 7 with "XXX" option to handle timezones with colons.  Might want to move to ThreadLocal static SimpleDateFormat, but I doubt that will buy us much...
Comment 2 Dominik Stadler 2016-03-21 14:38:31 UTC
*** Bug 59204 has been marked as a duplicate of this bug. ***