Bug 52367 - PPT: text extraction missing "update automatically" dates/times
Summary: PPT: text extraction missing "update automatically" dates/times
Alias: None
Product: POI
Classification: Unclassified
Component: HSLF (show other bugs)
Version: 3.8-dev
Hardware: PC All
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
Depends on:
Reported: 2011-12-19 18:13 UTC by Albert L.
Modified: 2012-01-15 12:07 UTC (History)
0 users

sample of file that fails to be text extracted (142.00 KB, application/vnd.ms-powerpoint)
2011-12-19 18:13 UTC, Albert L.

Note You need to log in before you can comment on or make changes to this bug.
Description Albert L. 2011-12-19 18:13:25 UTC
Created attachment 28087 [details]
sample of file that fails to be text extracted

When text extracting a PPT file, dates/times that are inserted with "update automatically" are not text extracted.
Comment 1 Yegor Kozlov 2012-01-15 12:07:35 UTC
Automatic date/time text from .ppt files cannot be extracted because it is not stored in the file. The viewing application (PowerPoint, OpenOffice, whatever) is  responsible for interpretation of "automatic" text elements and showing current date/time. POI does not interpret the format, it can read and extract data but does not support all features available in MS Office.

Note the different between auto text in PPT and PPTX formats: the PPTX format always stores cached text, i.e. the last value seen by PowerPoint and this is the text that is extracted. The  PPT format does not store cached value.