Bug 53243

Summary: Extract Tables from word document
Product: POI Reporter: ahmed <ayah683>
Component: HWPFAssignee: POI Developers List <dev>
Status: RESOLVED WORKSFORME    
Severity: normal    
Priority: P2    
Version: unspecified   
Target Milestone: ---   
Hardware: PC   
OS: Windows XP   
Attachments: word document file

Description ahmed 2012-05-16 15:43:34 UTC
Created attachment 28793 [details]
word document file

i used POI 3.8 to extract tables from word document
but i can't get all tables in Doc
i write this code to get this action

    public static void main(String[] args) {
        String fileName = "C:\\fjn3312r.doc";
        try {
            InputStream fis = new FileInputStream(fileName);
            POIFSFileSystem fs = new POIFSFileSystem(fis);
            HWPFDocument doc = new HWPFDocument(fs);

            Range range = doc.getRange();

            int tblNameIdx = 0;
            for (int i = 0; i < range.numParagraphs(); i++) {


                Paragraph tablePar = range.getParagraph(i);

                String parText = tablePar.text();

                try {
                    Pattern pattern = Pattern.compile("[\\s]*", Pattern.CASE_INSENSITIVE);
                    Matcher matcher = pattern.matcher(parText);

                    if (matcher.matches()) {
                        continue;
                    }
matcher.matches());
                } catch (Exception e) {
                    e.printStackTrace();
                }

                    Paragraph tableName = range.getParagraph(tblNameIdx);
                    System.out.println("Table name=====>>" + tableName.text());
                    Table table = range.getTable(tablePar);
                    for (int rowIdx = 0; rowIdx < table.numRows(); rowIdx++) {
                        TableRow row = table.getRow(rowIdx);
                        BorderCode bc = row.getVerticalBorder();
                        i = i + 1;
                        row.text();

                        String rowText = "";
                        for (int colIdx = 0; colIdx < row.numCells(); colIdx++) {
                            TableCell cell = row.getCell(colIdx);
                            rowText = rowText + "\t" + cell.getParagraph(0).text();


                            i = i + 1;
                        }
                        System.out.println("Row----" + rowIdx + " ===>>" + rowText);

                    }
                    i = i - 1;
                } else {
                    tblNameIdx = i;
                }

            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}
Comment 1 Sergey Vladimirov 2012-11-06 16:54:50 UTC
Ahmed,

The first table is placed inside of textbox, not as part of "main" text. If you need content of it, you need to navigate into textbox document part and extract data from it.

Second and last table are correctly extracted.

Sergey