Created attachment 21989 [details] Table reading test class Table-related text() method seems to be screwed up. Given a table cell, TableCell.text() method returns not only cell's text, but also part of text of nearest cells. Given a sample 3x3 table with cell texts marked "CELLij", with i=row, j=column, if the top left cell is empty, returned texts are as follows (from Eclipse console output): CELL[0][0]=CELL01 CELL[0][1]=CELL01CELL02 CELL[0][2]=CELL02 CELL[1][0]=CELL10CELL11 CELL[1][1]=CELL11CELL12 CELL[1][2]=CELL12 CELL[2][0]=CELL20CELL21 CELL[2][1]=CELL21CELL22 CELL[2][2]=CELL22 Only last cell's text seems to be right. The simple test class I've used is [code] package org.apache.poi.hwpf; import java.io.*; import javax.swing.JFileChooser; import javax.swing.JOptionPane; import org.apache.poi.hwpf.usermodel.*; public class QuickTest { public QuickTest() { } public static void main(String[] args) { try { JFileChooser jfc = new JFileChooser(); int esito = jfc.showOpenDialog(null); if(esito != JFileChooser.APPROVE_OPTION) { JOptionPane.showMessageDialog(null, "No file selected"); } else { String percorso = jfc.getSelectedFile().getAbsolutePath(); HWPFDocument doc = new HWPFDocument(new FileInputStream(percorso)); Range r = doc.getRange(); for(int i = 0; i < r.numParagraphs(); i++) { Paragraph p = r.getParagraph(i); if(p.isInTable()) { Table t = r.getTable(p); int cl = numCol(t); System.out.println("Found " + t.numRows() + "x" + cl + " table"); dumpTab(t); i += t.numParagraphs() - 1; } } } } catch(Exception er) { er.printStackTrace(); } } private static int numCol(Table t) { int col = 0; for(int i = 0; i < t.numRows(); i++) { if(t.getRow(i).numCells() > col) col = t.getRow(i).numCells(); } return col; } private static void dumpTab(Table t) { for(int i = 0; i < t.numRows(); i++) { TableRow tr = t.getRow(i); for(int j = 0; j < tr.numCells(); j++) { TableCell tc = tr.getCell(j); System.out.println("CELL[" + i + "][" + j + "]=" + tc.text()); } } } } [/code] Sample test doc attached
*** Bug 45167 has been marked as a duplicate of this bug. ***
As a fix: needs to put back old constructor settings for TableCell (end+1->end) Works fine (in TableRow.java) _cells[cellIndex] = new TableCell(start, end, this, levelNum, _tprops.getRgtc()[cellIndex], _tprops.getRgdxaCenter()[cellIndex], _tprops.getRgdxaCenter()[cellIndex+1]-_tprops.getRgdxaCenter()[cellIndex]);
Any chance you could upload a simple file that shows up that problem? I'd like to commit a test at the same time as your fix, so we know it won't get broken again in the future. Your code looks like a good basis for a test, just need a file to drive it!
Test code can be tried with the same Word document attached for bug 44292 - [PATCH] TableCell skip its last Paragraphs which, given to the QuickTest class I posted, gives as output Found 1x3 table CELL[0][0]=One paragraph is ok CELL[0][1]=First para is ok Second paragraph is skipped CELL[0][2]=One paragraph is ok showing that bugs 44292 and 45062 (this one) do not appear anymore using Trunk source code version, because table cells text is *right* and TableCell *does not skip* its last Paragraphs. JUnit docet.
Just re-tested with svn trunk, and hwpf works properly
(In reply to comment #5) > Just re-tested with svn trunk, and hwpf works properly > The problem seems to be located in the way in which HWPF handles paragraphs and character runs of table cells. In TestProblems.java (the JUnit testcase related to this bug and bug 44292) I tried to add the lines (in first table cell checks) //A - the *already existing* tests on first table cell assertEquals(1, cell.numParagraphs()); assertEquals("One paragraph is ok\7", cell.getParagraph(0).text()); //A end //B - mine, added assertEquals(1, cell.numCharacterRuns()); assertEquals("One paragraph is ok\7", cell.getCharacterRun(0).text()); //B end //C - mine, added assertEquals("One paragraph is ok\7", cell.text()); and, 1) if in TableCell.java "end+1" is used, "A" and "C" blocks pass the test, but "B" fails: then, in this case paragraphs list and text() run correctly, but character runs retrieval does not 2) if in TableCell.java "end" is used, "B" block passes and "A" and "C" fail: then, in this case, character runs list is retrieved correctly correctly, but paragraph retrieval and text() doe not