I had written this simple piece of code FileInputStream fileInputStream = new FileInputStream(new File("C:\\in.doc")); FileOutputStream fileOutputStream = new FileOutputStream(new File("C:\\out.doc")); HWPFDocument hwpfDocument = new HWPFDocument(fileInputStream); Range range = hwpfDocument.getRange(); int numParagraph = range.numParagraphs(); for (int i = 0; i < numParagraph; i++) { Paragraph paragraph = range.getParagraph(i); int numCharRuns = paragraph.numCharacterRuns(); for (int j = 0; j < numCharRuns; j++) { CharacterRun charRun = paragraph.getCharacterRun(j); String text = charRun.text(); charRun.replaceText(text, "added"); } } hwpfDocument.write(fileOutputStream); After the execution, the output file becomes corrupted ( tried to open with office 2007) , input file was properly opening. input file is a very basic file, with 4-5 simple lines No exception thrown. had written similar code using XWPFDocument, and it works fine
One addition, fileOutputStream.close() is present in the original code, by mistake, which i didn't copy while raising the defect. Please remember, the following code is working perfectly XWPFDocument xwpfDocument = new XWPFDocument(inputStream); List<XWPFParagraph> paragraphs = xwpfDocument.getParagraphs(); for(XWPFParagraph xParagraph:paragraphs) { for(XWPFRun xwpfRun : xParagraph.getRuns()) { xwpfRun.setText("replace", 0); } } xwpfDocument.write(outputStream); outputStream.close();
Which version of POI? Please try with the latest build from trunk, there have been quite a lot of updates recently. If the problem is still there, please attach the problematic file. Without the input .doc file we can't do much to help you. Yegor
Created attachment 27346 [details] test input file for hwpf I have tested with 6-Jul-2011 nightly build. poi-bin-3.8-beta3-20110606.tar
Now I have tested with yesterday's nightly build poi-3.8-beta4-20110803 from http://encore.torchbox.com/poi-svn-build/
Very interested in this bug. Having the same issue. Can't use XWPF since contract requires us to maintain original version.
Fixed in r1155211 (3.8-beta4-20110810 or later). Please, test.
Created attachment 27369 [details] test output file for hwpf I have tested with 3.8-beta4-20110810, but the defect still exists. Now, I have added a check, when the line contains the string "Header" ( please see the input doc ) then it will replace the text. if(text.contains("Header")) charRun.replaceText(text, "added"); And I have compared the output file with "beyond compare" ( a file comparison tool ), it shows the text has been replaced properly, but the document format is corrupted. I don't know much about the doc format so cannot comment. I have attached the output file, hope it will help.
Created attachment 27370 [details] comparison result comparison result
Sanmoy, I don't see the difference between in.doc and out.doc, except "Header" -> "added" change. What exaclty is broken? Sergey
Created attachment 27371 [details] error-snap-shot please try to open the output file with MS office 2003 or 2007, it will display file corrupted
Sanmoy, I did fix a couple of issues (FIB and stylesheets processing) that may be reason of why file is not opening by Microsoft Office. Please try with next night build or trunk version. Result file is still not passed binary file validation tool thought :( Sergey
Sanmoy, Please, check the latest trunk version or 3.8-beta4 or later.
the defect has been fixed .. thanks
/** * Replace (all instances of) a piece of text with another... * * @param pPlaceHolder * The text to be replaced (e.g., "${organization}") * @param pValue * The replacement text (e.g., "Apache Software Foundation") */ public void replaceText(String pPlaceHolder, String pValue) The replaceText API will not work if the String pValue contains the String pPlaceHolder For example if pPlaceHolder="abcd" and pValue="abcd" or "abcdef" or "12abcdef" this code will go to a infinite loop Modify the original testcode charRun.replaceText(text, text); that is, try to replace the original value with itself, it will not work, it will fall into a infinite loop. For your convenience, I am copying the original code again. Please test it with the attached files FileInputStream fileInputStream = new FileInputStream(new File("C:\\in.doc")); FileOutputStream fileOutputStream = new FileOutputStream(new File("C:\\out.doc")); HWPFDocument hwpfDocument = new HWPFDocument(fileInputStream); Range range = hwpfDocument.getRange(); int numParagraph = range.numParagraphs(); for (int i = 0; i < numParagraph; i++) { Paragraph paragraph = range.getParagraph(i); int numCharRuns = paragraph.numCharacterRuns(); for (int j = 0; j < numCharRuns; j++) { CharacterRun charRun = paragraph.getCharacterRun(j); String text = charRun.text(); charRun.replaceText(text, text); } } hwpfDocument.write(fileOutputStream); fileOutputStream.close(); I have tested with the latest nightly buid 3.8-beta5-20110904 I have debugged the poi code and found the problem in this following logic String text = text(); int offset = text.indexOf(pPlaceHolder); text is returning the replaced value and if the replaced value contains the original String, offset will always be >=0 and it will keep on increasing public void replaceText(String pPlaceHolder, String pValue) { boolean keepLooking = true; while (keepLooking){ String text = text(); int offset = text.indexOf(pPlaceHolder); if (offset >= 0) replaceText(pPlaceHolder, pValue, offset); else keepLooking = false; } }
Hi, I have tested poi-bin-3.8-beta5-20111217.tar.gz with a MS office 2003 (input file) using "charRun.replaceText("XYZ", "ABC");" and the output file is corrupted. without replacement the output is ok, so I guess that problem come from replacement logic. Also it worked out with poi-bin-3.8-beta4 if length of pValue was equal to the length of PlaceHolder.. otherwise the output file was also corrupted ! Any idea ! Thanks