Basic Text Extraction --------------------- For basic text extraction, make use of org.apache.poi.extractor.PowerPointExtractor. It accepts a file or an input stream. The getText() method can be used to get the text from the slides, from the notes, or from both. Specific Text Extraction ------------------------ To get specific bits of text, first create a org.apache.poi.usermodel.SlideShow (from a org.apache.poi.HSLFSlideShow, which accepts a file or an input stream). Use getSlides() and getNotes() to get the slides and notes. These can be queried to get their page ID (though they should be returned in the right order). You can also call getTextRuns() on these, to get their blocks of text. From the TextRun, you can extract the text, and check what type of text it is (eg Body, Title) Changing Text ------------- It is possible to change the text via TextRun.setText(String). However, if the length of the text is changed, things will break because PowerPoint has internal file references in byte offsets, which are not yet all updated when the size changes. Guide to key classes -------------------- org.apache.poi.hslf.HSLFSlideShow Handles reading in and writing out files. Generates a tree of the records in the file org.apache.poi.hslf.usermode.SlideShow Builds up model entries from the records, and presents a user facing view of the file org.apache.poi.hslf.extractor.PowerPointExtractor Uses the model code to allow extraction of text from files