org/apache/poi/hssf/dev/HSSF.java org/apache/poi/hssf/record/TickRecord.java ------------------------------------------ There is a typo. Original code has the following semantic: this.filename = filename this.field_12_zero5 = field_12_zero5 //resp. which is (in these methods) equal to: this.filename = this.filename this.field_12_zero5 = this.field_12_zero5 //resp. This makes no sense org/apache/poi/hssf/model/Sheet.java ------------------------------------ 1. The often called method 'addValueRecord' creates a lot of garbage int[] Objects. They are needed just for debug-Tracing. I surrounded it with a standard 'if(Logger.isDebugEnabled...)' 2. The columns.getIterator() is expensive, as it always generates a Garbage-Object (the Iterator itself). A small refactoring to an Array-Loop with index speeds up POI. org/apache/poi/hssf/record/aggregates/ColumnInfoRecordsAggregate.java --------------------------------------------------------------------- see description for 'org/apache/poi/hssf/model/Sheet.java', point 2. org/apache/poi/hssf/record/RowRecord.java org/apache/poi/hssf/record/UnicodeString.java --------------------------------------------- All the BitField-Helper-Objects can be static, as they are invariant. The idea to reduce garbage with the BitFieldFactory.getInstance(...) is good, but there also be an Garbage-Object: An java.lang.Integer to lookup the cache-Map. org/apache/poi/hssf/record/aggregates/ValueRecordsAggregate.java ---------------------------------------------------------------- I changed the 'backing collection' from TreeMap to a 2-dimensional Array of CellValueRecordInterface. The TreeMap is pretty expensive, as every entry needs a TreeMap.Entry Object. We produce much less Garbage-Objects with this implementation. org/apache/poi/hssf/usermodel/DummyGraphics2d.java -------------------------------------------------- Eclipse means that there should be an explicit call of 'toString' for arrays. Was not really a bug. org/apache/poi/hssf/usermodel/HSSFCell.java ------------------------------------------- The idea of this change is to not have redundant Values in the HSSFCell-Object an the corresponding CellValueRecord. Especially the following Attributes in HSSFCell have been removed: cellNum (=column-Number) cellStyle (=Style-Index) cellValue (=value if Cell is Numeric) booleanValue (=dito if Cell is Boolean) errorValue (=dito if....) rot (=row Number) Access to these values is now delegated to the underlying CellValueRecord. All together we gain speed and save heap. An Example where 50*1000 Cells (25'000 Strings + 25'000 Numerics) were generated and then written via a byte[] to a file: Before: 1000ms, over 600'000 Objects (mostly Garbage), 22MB Heap-usage After : 370ms, 210'000 Objects (70'000 Garbage), 12MB Heap-usage I hope that you can use some of these optimizations. Greetings, Oliver Henning aloba ag Switzerland
Created attachment 17042 [details] Patch-file
Oliver, Wow this all looks really good, and something that i have been wanting to look at getting arround to. One idea that i had also was that the HSSFRow, HSSFCell objects are kept but for most applications we basically create a row, some cells and then move onto the next row and repeat. All of the HSSFRow & HSSFCell objects from the previous row could therefore be garbage collected if they could be re-created from the low-level objects as required. Ok this results in alot of garbage, but in a server environment this would lower the overall memory consumption with a bit of processor gc increase. If there was a switch to determine whether the user model objects could be created on-demand then that woudl be grand. Maybe someone will look at this before i have the chance. hint hint! In the mean time ill look at commiting the patch. Jason
Salü Jason Good Idea, but quite a big thing to implement, hm? But there is still potential in the existing model: 1) HSSFRichTextString in HSSFCell is still a 'duplicate' of the low-level-record. This Class is new (compared to poi-2.5.1-final-20040804) and pretty heavy-weight compared to the 'good old string' in POI-2.5.1. Perhaps there could be both models and the user decides whether he can use the one (fast string) or the other (rich text) 2) The method HSSFRow.createCell(short column, int type) is deprecated (but I don't know why). You should always use the 'simpler' method HSSFRow.createCell(short column) which constructs a 'blank-record'. This record normally becomes garbage with the next call to cell.setCellValue(...). It is much more efficient if you say directly at 'createCell' which type of cell you need. So why not remove the '@deprecated'? 3) It could be possible to remove the attribute "int cellType" from HSSFCell. This is redundant, as you can identify the type also with "record instanceof XXX" And finally, for short: we like POI ;-) Greetings, Oli
Patch applied to SVN. Needed modifications so that the unit tests would pass. Oliver looking forward to more Performance patches ;-) Jason