Created attachment 22394 [details] Simple test doc with body, header/footer, annotations, footnotes and endnotes Using a small trick (based on text length) it's possibile to get the location of a Range (body? header/footer? footnote? etc.). For example, let's suppose to have 3 character runs: 1) coded in ASCII, ending at 2000 2) coded in Unicode, ending at 4050 3) coded in ASCII, ending 2100 4) coded in Unicode, ending at 4200 5) coded in Unicode, ending at 4500 and that ccpText field of the document they belong is 2100. If every chacater run was in ASCII (we can know if a character run is Unicode or ASCII, comparing length in characters from text and length in bytes from end-start), the end values would be 1) 2000 2) 2025 3) 2100 4) 2100 5) 2250 and then, comparing *these* end values with ccpText, we can conclude that the character runs are 1) in body 2) in body 3) at end of body 4) at end of body 5) out of body, maybe in footnote This same algorithm can be applied to all Range types (paragraph, section, and so on) and to all locations (body, header/footer, footnote, etc.) To make it possible, it's necessary to; 1) add to FileInformationBlock class the new lines public int getCcpFtn() { return _longHandler.getLong(FIBLongHandler.CCPFTN); } public int getCcpHdd() { return _longHandler.getLong(FIBLongHandler.CCPHDD); } public int getCcpAtn() { return _longHandler.getLong(FIBLongHandler.CCPATN); } public int getCcpEdn() { return _longHandler.getLong(FIBLongHandler.CCPEDN); } to know limits in characters of footnotes, header/footer, annotations and endnotes respectively 2) create a new enum in "usermodel" package to represent locations public enum Location { BODY, FOOTNOTE, HEADER_FOOTER, ANNOTATION, ENDNOTE, UNKNOWN; } Instead of an enum, also a series of int constants defined in Range may be used. 3) add to Range class the new member variable protected Location _location = null; and the new method public Location getLocationType() { if(_location == null) { //it stores the end in characters int x = 0; int charLen = this.text().length(); int byteLen = _end - _start; if(byteLen == charLen) x = _end; //ASCII else x = _end / 2; //Unicode FileInformationBlock fib = _doc.getFileInformationBlock(); if(x <= fib.getCcpText()) _location = Location.BODY; else if(x <= fib.getCcpText() + fib.getCcpFtn()) _location = Location.FOOTNOTE; else if(x <= fib.getCcpText() + fib.getCcpFtn() + fib.getCcpHdd()) _location = Location.HEADER_FOOTER; else if(x <= fib.getCcpText() + fib.getCcpFtn() + fib.getCcpHdd() + fib.getCcpAtn()) _location = Location.ANNOTATION; else if(x <= fib.getCcpText() + fib.getCcpFtn() + fib.getCcpHdd() + fib.getCcpAtn() + fib.getCcpEdn()) _location = Location.ENDNOTE; else _location = Location.UNKNOWN; } return _location; } This is a simple test class (perhaps it can be transformed in a JUnit testcase?) to test my code: public class QuickTest { public QuickTest() { } public static void main(String[] args) { try { JFileChooser jfc = new JFileChooser(); int esito = jfc.showOpenDialog(null); if(esito != JFileChooser.APPROVE_OPTION) { JOptionPane.showMessageDialog(null, "No file selected"); } else { String percorso = jfc.getSelectedFile().getAbsolutePath(); HWPFDocument doc = new HWPFDocument(new FileInputStream(percorso)); Range r = doc.getRange(); for(int i = 0; i < r.numParagraphs(); i++) { //Paragraph, CharacterRun, Section... it's equivalent Paragraph cr = r.getParagraph(i); System.out.println("<" + cr.text().trim() + "> " + cr.getLocationType()); } } } catch(Exception er) { er.printStackTrace(); } } } which, applied to test doc I have attached, produces the output <BODY TEXT FRAGMENT 1> BODY <BODY TEXT FRAGMENT 2> BODY <> BODY <FOOTNOTE TEXT 1> FOOTNOTE <FOOTNOTE TEXT 2> FOOTNOTE <> FOOTNOTE <> HEADER_FOOTER <> HEADER_FOOTER <> HEADER_FOOTER <> HEADER_FOOTER <> HEADER_FOOTER <> HEADER_FOOTER <> HEADER_FOOTER <> HEADER_FOOTER <HEADER TEXT FRAGMENT 1> HEADER_FOOTER <HEADER TEXT FRAGMENT 2> HEADER_FOOTER <> HEADER_FOOTER <FOOTER TEXT FRAGMENT 1> HEADER_FOOTER <FOOTER TEXT FRAGMENT 2> HEADER_FOOTER <> HEADER_FOOTER <> HEADER_FOOTER <ANNOTATION 1> ANNOTATION <ANNOTATION 2> ANNOTATION <> ANNOTATION <ENDNOTE TEXT> ENDNOTE <> ENDNOTE <> UNKNOWN
Something similar to this is now in svn getRange() now only returns the main body, but getOverallRange() gives you the lot. There are also a few other Range getters, for the other things like header+footer The unicode stuff has also been made a bit nicer, so the range detection stuff is much simpler now too :)
(In reply to comment #1) > Something similar to this is now in svn > > getRange() now only returns the main body, but getOverallRange() gives you the > lot. There are also a few other Range getters, for the other things like > header+footer > > The unicode stuff has also been made a bit nicer, so the range detection stuff > is much simpler now too :) > I have poi-3.1-final but I dont see any getOverallRange nor any Range getters.