Initial support for powerpoint, as described in post to poi-user. Supports finding slides and meta slides (by a process of scanning the file for known byte sequences), and getting text out from slides and meta slides.
Created attachment 14278 [details] Code for initial powerpoint support
Created attachment 14332 [details] Updated code to support powerpoint New version, with help from Shaheed Haque. Instead of blindly scanning the file looking for interesting byte sequences, now mostly understands how records work, and tries to find interesting ones that way.
Created attachment 14689 [details] Next version of PPT support Next release of powerpoint support. Now able to write files back out to disk, and have PowerPoint still be able to open them. Also includes some bug fixes to text runs. Still doesn't let you edit files though! Can only load, extract text, and save again.
Created attachment 14701 [details] Next version of PPT supprt Functionally quite similar to the last version. A few more records are implemented, but not enough to edit text yet. Also has a few bug fixes.
Created attachment 14760 [details] Next version of PPT supprt Functionally very similar to last version. Better structure for how Containers write themselves out, which should speed development. Bug fix for some NotesAtom entries being longer than 6 bytes, which broke save.
Created attachment 14825 [details] Next version of PPT supprt Slide text is now properly record based, as is the corresponding model code. It is now possible to edit the text of bits of powerpoint slides (not notes), *BUT* only if you don't change the length of the text!
Nick, thanks for all the work! A few lines of docs would be useful.. I looked at the code, but couldnt figure out where to start, i.e., couldnt figure out very easily how to use the api. I'm sure its obvious once you know where to look.. but.. A couple of nits.. 1. Do the methods of TextMunger do anything that the methods of o.a.p.util.StringUtil do not do? We're quite paranoid about duplication.. its a pain to maintain. 2. HSFLSlidehsow should probably be in an usermodel package? ..Later, reading the javadocs makes things clearer, but a quickstart guide would probably still be useful. Also, here are some simple tests you can take inspiration from. Look at o.a.p.hssf.records.TestStringRecord for a very simple low level test. Look at o.a.p.hssf.model.TestSheet for a higher level test on the model. Look at o.a.p.hssf.usermodel.TestHSSFCell for a high level test.
Created attachment 15088 [details] Next version of PPT supprt Much better usermodel and model code. Several bug fixes, and improved text extraction
Created attachment 15089 [details] Unit tests for powerpoint code First version of unit tests for powerpoint code. Tests writing out, text extraction, and some parts of the record layer (but not yet all)
Created attachment 15090 [details] Quick guide to using the PowerPoint code Basic introduction to using the PowerPoint code. Describes how to extract text, how to change text, warnings about changing test, how to get Slides, key classes etc.
TestReWrite fails. Any ideas? single-scratchpad-test: [junit] Running org.apache.poi.hslf.TestReWrite [junit] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 0.161 sec [junit] Testsuite: org.apache.poi.hslf.TestReWrite [junit] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 0.161 sec [junit] Testcase: warning took 0.006 sec [junit] FAILED [junit] Exception in constructor: testWritesOutTheSame (java.lang.NegativeArraySizeException [junit] at org.apache.poi.ddf.EscherClientAnchorRecord.fillFields(EscherClientAnchorRecord.java:74) [junit] at org.apache.poi.ddf.EscherContainerRecord.fillFields(EscherContainerRecord.java:55) [junit] at org.apache.poi.ddf.EscherContainerRecord.fillFields(EscherContainerRecord.java:55) [junit] at org.apache.poi.ddf.EscherContainerRecord.fillFields(EscherContainerRecord.java:55) [junit] at org.apache.poi.hslf.record.PPDrawing.findEscherChildren(PPDrawing.java:108) [junit] at org.apache.poi.hslf.record.PPDrawing.<init>(PPDrawing.java:85) [junit] at org.apache.poi.hslf.record.Record.createRecordForType(Record.java:159) [junit] at org.apache.poi.hslf.record.Record.findChildRecords(Record.java:102) [junit] at org.apache.poi.hslf.record.DummyRecordWithChildren.<init>(DummyRecordWithChildren.java:50) [junit] at org.apache.poi.hslf.record.Record.createRecordForType(Record.java:155) [junit] at org.apache.poi.hslf.record.Record.findChildRecords(Record.java:102) [junit] at org.apache.poi.hslf.HSLFSlideShow.readFIB(HSLFSlideShow.java:173) [junit] at org.apache.poi.hslf.HSLFSlideShow.<init>(HSLFSlideShow.java:102) [junit] at org.apache.poi.hslf.TestReWrite.<init>(TestReWrite.java:44)
The failure is because no-one has applied my patch to ddf.EscherClientAnchorRecord, in bug #34787 Without that patch, ddf.EscherClientAnchorRecord will do nasty things, because the current version assumes a different size of record to what's really there. Hopefully, if you try again having applied the ddf patch, the test will work.
OK thanks, that makes sense. Its been a busy couple of days for me, and I'll get to it in a day or two.. sorry.
Ok, so I now get a failure. Any ideas? thanks! Are the ppt files attached to the testcases correct? single-scratchpad-test: [junit] Testsuite: org.apache.poi.hslf.TestRecordCounts [junit] Tests run: 3, Failures: 1, Errors: 0, Time elapsed: 0.17 sec [junit] Testcase: testSheetsCount took 0.008 sec [junit] FAILED [junit] expected:<2> but was:<0> [junit] junit.framework.AssertionFailedError: expected:<2> but was:<0> [junit] at junit.framework.Assert.fail(Assert.java:47) [junit] at junit.framework.Assert.failNotEquals(Assert.java:282) [junit] at junit.framework.Assert.assertEquals(Assert.java:64) [junit] at junit.framework.Assert.assertEquals(Assert.java:201) [junit] at junit.framework.Assert.assertEquals(Assert.java:207) [junit] at org.apache.poi.hslf.TestRecordCounts.testSheetsCount(TestRecordCounts.java:54)
Created attachment 15187 [details] New Unit tests for powerpoint code Update the unit tests so the all actually work (I was sure I'd done that last time, but it seems I was a muppet and missed one test)
Comitted. Thanks Nick! Please verify. Please provide new stuff as diffs against CVS, attached to a new bug.