Bug 50750 - Support MS OneNote file format
Summary: Support MS OneNote file format
Status: NEW
Alias: None
Product: POI
Classification: Unclassified
Component: POI Overall (show other bugs)
Version: unspecified
Hardware: All All
: P2 enhancement with 4 votes (vote)
Target Milestone: ---
Assignee: POI Developers List
Depends on:
Reported: 2011-02-10 11:55 UTC by Jan Høydahl
Modified: 2015-11-23 09:20 UTC (History)
1 user (show)


Note You need to log in before you can comment on or make changes to this bug.
Description Jan Høydahl 2011-02-10 11:55:07 UTC
Support extracting text content from .one files as per this file format spec http://msdn.microsoft.com/en-us/library/dd924743(v=office.12).aspx
Comment 1 Nick Burch 2011-02-10 13:12:59 UTC
Any chance you could create a few sample documents and upload them?

Ideally we'd want say 2 or 3 files. For each one, we'd also want a text file with the textual contents of the file (so we can make sure we get most of the contents), and possibly also a screenshot of the file when it's open in onenote (so we can get a feel for how the text might come out)
Comment 2 Jan Høydahl 2011-02-14 13:15:11 UTC
Here are some sample OneNote files in a zip file:


Zip contains:

The files are the default sample document in OneNote2010. The document is one section, 2 pages. Created with OneNote2010. The 2007 file is exported from OneNote2010. The .onepkg file has the same contents as the other files, but saved as a package. The txt doc is created by selecting all text on the page and then COPY, so you get an idea of what is graphics and what is text. The PDF gives a visual impression of the original workbook.
Comment 3 Nick Burch 2011-02-14 14:24:10 UTC
Thanks for these

I can't promise I'll be able to work on this very soon, but I should be able to add in Tika support just as soon as I've done the POI bit...