Issue originally reported on Tika 1.17 (using POI 3.17) https://issues.apache.org/jira/browse/TIKA-2666 Initial investigation points to POI. ********* In the attached screen shot PPT_lastPrinted_00.png, the date for last print was set to 00:00 But when Tika, using POI 3.17, extracts metadata from this document, the last print date is in the year 27321 ! Last-Printed: 27321-01-23T08:20:12Z meta:print-date: 27321-01-23T08:20:12Z Attachments: Genetic_Factors_and_the_Directionality_of.ppt : PowerPoint 97-2003 PPT_lastPrinted_00.png : properties of the PPT tika-app-1.17.metadata.txt : metadata extracted by Tika 1.17
Created attachment 35965 [details] Tika Metadata
Created attachment 35966 [details] PPT properties
File Genetic_Factors_and_the_Directionality_of.ppt, to reproduce the issue, is too large to attach. Is there another way to submit this file ?
(In reply to Isabelle Giguere from comment #3) > File Genetic_Factors_and_the_Directionality_of.ppt, to reproduce the issue, > is too large to attach. > > Is there another way to submit this file ? Available from https://issues.apache.org/jira/browse/TIKA-2666
Libre Office displays "25.11.31134, 15:56:23" with my system locale Germany and the current timezone UTC+2 / CEST. I fixed the unsigned handling and now POI returns the system locale specific date "Sun Nov 25 14:56:23 CET 31134" - I'm puzzled why there's another hour offset in Libre Office, but it looks like the same error as in [1], i.e. using the current daylight saving timezone for the whole year. The interpretation that this is not a valid date and a default of 00:00 should be used, is not POIs business, i.e. Tika needs to handle unrealistic values. Applied via r1833668 [1] https://stackoverflow.com/questions/4605983/io-file-getlastaccesstime-is-off-by-one-hour