? ppt-file-format.xml Index: book.xml =================================================================== RCS file: /home/cvspublic/jakarta-poi/src/documentation/content/xdocs/hslf/book.xml,v retrieving revision 1.1 diff -u -r1.1 book.xml --- book.xml 28 May 2005 19:28:22 -0000 1.1 +++ book.xml 3 Jun 2005 14:04:47 -0000 @@ -13,6 +13,7 @@
Index: quick-guide.xml =================================================================== RCS file: /home/cvspublic/jakarta-poi/src/documentation/content/xdocs/hslf/quick-guide.xml,v retrieving revision 1.1 diff -u -r1.1 quick-guide.xml --- quick-guide.xml 28 May 2005 19:28:22 -0000 1.1 +++ quick-guide.xml 3 Jun 2005 14:04:47 -0000 @@ -15,8 +15,9 @@For basic text extraction, make use of
org.apache.poi.extractor.PowerPointExtractor
. It accepts a file or an input
-stream. The getText()
method can be used to get the text from the slides,
-from the notes, or from both.
+stream. The getText()
method can be used to get the text from the slides, and the getNotes()
method can be used to get the text
+from the notes. Finally, getText(true,true)
will get the text
+from both.
If speed is the most important thing for you, you don't care
+ about getting duplicate blocks of text, you don't care about
+ getting text from master sheets, and you don't care about getting
+ old text, then
+ org.apache.poi.extractor.QuickButCruddyTextExtractor
+ might be of use.
QuickButCruddyTextExtractor doesn't use the normal record + parsing code, instead it uses a tree structure blind search + method to get all text holding records. You will get all the text, + including lots of text you normally wouldn't ever want. However, + you will get it back very very fast!
+There are two ways of getting the text back.
+ getTextAsString()
will return a single string with all
+ the text in it. getTextAsVector()
will return a
+ vector of strings, one for each text record found in the file.
+
It is possible to change the text via TextRun.setText(String)
. However, if
-the length of the text is changed, things will break because PowerPoint has
-internal file references in byte offsets, which are not yet all updated when
-the size changes.
+
It is possible to change the text via
+ TextRun.setText(String)
. However, if the length of
+ the text is changed, things will break because PowerPoint has
+ internal file references in byte offsets. We currently update all
+ of these byte references that we know about when writing out, but
+ there are a few more still to be found.
org.apache.poi.hslf.HSLFSlideShow
- Handles reading in and writing out files. Generates a tree of the records
- in the file
+ Handles reading in and writing out files. Calls
+ org.apache.poi.hslf.record.record
to build a tree
+ of all the records in the file, which it allows access to.
+ org.apache.poi.hslf.record.record
+ Base class of all records. Also provides the main record generation
+ code, which will build up a tree of records for a file.
org.apache.poi.hslf.usermode.SlideShow
Builds up model entries from the records, and presents a user facing
@@ -55,4 +82,4 @@