Bug 66620 - Heap dump when using SXSSF for large XLSX files containing RichTextStrings
Summary: Heap dump when using SXSSF for large XLSX files containing RichTextStrings
Status: RESOLVED INFORMATIONPROVIDED
Alias: None
Product: POI
Classification: Unclassified
Component: SXSSF (show other bugs)
Version: 5.2.3-FINAL
Hardware: PC Linux
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-05-30 10:41 UTC by Tonny
Modified: 2023-10-14 22:39 UTC (History)
0 users



Attachments
Export of many RichTextStrings with font styles To Excel (4.61 KB, text/x-java)
2023-05-30 10:41 UTC, Tonny
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Tonny 2023-05-30 10:41:10 UTC
Created attachment 38567 [details]
Export of many RichTextStrings with font styles To Excel

If you create Excel (XLSX) files using SXSSF and you want to have font attributes (e.g. sub and superscript) applied to sub parts of a string then you have to use RichTextStrings. Currently, it is only possible to get it to work with SXSSF if you set the parameter 'useSharedStringTable' to 'true' on the SXSSFWorkbook.

Setting 'useSharedStringTable' to 'true' will make it store the strings in memory in a structure which encapsulates the string. The problem is that when the Workbook flush its rows it does not clear the structure containing the string data. Hence, it piles up when you export many rows and may eventually result in a heap dump depending on the number of rows being exported and the available memory.

When using SXSSF and 'useSharedStringTable' set to 'true' then it should clear the underlying structure for the rows which has been flushed in order to keep memory consumption low.

I've attached a little Java example which can be used to reproduce the problem.
Comment 1 Tonny 2023-05-30 10:43:43 UTC
Comment on attachment 38567 [details]
Export of many RichTextStrings with font styles To Excel

>package com.foo.export;
>
>import java.io.FileOutputStream;
>import java.io.OutputStream;
>
>import org.apache.poi.ss.usermodel.CreationHelper;
>import org.apache.poi.ss.usermodel.Font;
>import org.apache.poi.ss.usermodel.IndexedColors;
>import org.apache.poi.ss.usermodel.RichTextString;
>import org.apache.poi.ss.usermodel.Row;
>import org.apache.poi.ss.usermodel.Sheet;
>import org.apache.poi.xssf.streaming.SXSSFCell;
>import org.apache.poi.xssf.streaming.SXSSFWorkbook;
>import org.apache.poi.xssf.usermodel.XSSFFont;
>import org.apache.poi.xssf.usermodel.XSSFWorkbook;
>
>public class ExportToExcelTest {
>
>	/**
>	 * Test which created a number of rows in an Excel document using SXSSF.
>	 * Each of the RichTextStrings in the cells are unique.
>	 *
>	 * The test is used to show that SXSSF can export RichTextString's, but it has a problem as
>	 * it require it to use SharedStringsTable. This cause memory consumption to explode as the
>	 * structure used to store the strings are not cleared for the rows which are flushed.
>	 */
>	public static void main(String[] args) throws Exception {
>
>//		final int NUMBER_OF_ROWS_TO_CREATE = 1000; // Works OK
>		final int NUMBER_OF_ROWS_TO_CREATE = 600000; // Breaks on my system as it gives an 'Exception in thread "main" java.lang.OutOfMemoryError: Java heap space'
>
>		//
>		// Create Workbook
>		//
>
>		// In order to be able to apply font styles on RichTextStrings we need to keep the strings in memory.
>		// This is a huge disadvantage as the memory consumption will increase greatly as the structure containing
>		// the strings are not reduced/cleared when the rows are flushed.
>		SXSSFWorkbook wb = new SXSSFWorkbook(null, 100, false, true);
>//		SXSSFWorkbook wb = new SXSSFWorkbook(new XSSFWorkbook(), 100, false, true); // Seems to result in the same as the above
>
>		//
>		// Create fonts used in test
>		//
>
>		// Create font with font 'Impact' and color dark blue
>		XSSFFont font1 = (XSSFFont) wb.createFont();
>		font1.setFontName("Impact");
>		font1.setColor(IndexedColors.DARK_BLUE.getIndex());
>
>		// Crete font with font 'Arial' and color light blue
>		XSSFFont font2 = (XSSFFont) wb.createFont();
>		font2.setFontName("Arial");
>		font2.setColor(IndexedColors.LIGHT_BLUE.getIndex());
>
>		// Create bold font
>		XSSFFont font_bold = (XSSFFont) wb.createFont();
>		font_bold.setBold(true);
>
>		// Create italic font
>		XSSFFont font_italic = (XSSFFont) wb.createFont();
>		font_italic.setItalic(true);
>
>		// Create font with double underline and red color
>		XSSFFont font5 = (XSSFFont) wb.createFont();
>		font5.setUnderline(XSSFFont.U_DOUBLE);
>		font5.setColor(IndexedColors.RED.getIndex());
>
>		// Create sub script font
>		XSSFFont font_sub = (XSSFFont) wb.createFont();
>		font_sub.setTypeOffset(Font.SS_SUB);
>
>		// Create super script font
>		XSSFFont font_super = (XSSFFont) wb.createFont();
>		font_super.setTypeOffset(Font.SS_SUPER);
>
>		//
>		// Create a sheet to put the rows into
>		//
>		Sheet sheet = wb.createSheet();
>
>		//
>		// Create X rows with almost the same data
>		//
>		for (int i = 0; i < NUMBER_OF_ROWS_TO_CREATE; ++i) {
>			CreationHelper creationHelper = wb.getCreationHelper();
>
>			Row row = sheet.createRow(i);
>			int column = 0;
>
>			// Create cell for content with RichText in it
>			SXSSFCell cell_1 = (SXSSFCell) row.createCell(column++);
>
>			// Rich text string with two fonts applied - one used in two places
>			RichTextString richString = creationHelper.createRichTextString("JavaCodePoint " + i);
>			richString.applyFont(0, 4, font1);
>			richString.applyFont(4, 8, font2);
>			richString.applyFont(8, 13, font1);
>			cell_1.setCellValue(richString);
>
>			// Create another cell
>			SXSSFCell cell_2 = (SXSSFCell) row.createCell(column++);
>
>			RichTextString richString2 = creationHelper.createRichTextString("This is another example of XSSFRichTextString with sub and superscript a1 m2 " + i);
>			richString2.applyFont(0, 15, font_bold);
>			richString2.applyFont(15, 26, font_italic);
>			richString2.applyFont(27, 45, font5);
>			richString2.applyFont(72, 73, font_sub);
>			richString2.applyFont(75, 76, font_super);
>			cell_2.setCellValue(richString2);
>
>			// Add 10 additional columns with a text string
>			for (int j = 0; j < 10; j++) {
>				SXSSFCell cell = (SXSSFCell) row.createCell(column++);
>				RichTextString richTextString = creationHelper.createRichTextString("Some content (" + i + "," + j + ")");
>				cell.setCellValue(richTextString);
>			}
>
>		}
>
>		// Write the output to a file
>		try (OutputStream fileOut = new FileOutputStream("RichTextStringExample.xlsx")) {
>			wb.write(fileOut);
>		}
>
>		// close the workbook
>		wb.close();
>
>		// Print the confirmation message
>		System.out.println("Excel file was created successfully!");
>	}
>
>}
>
Comment 2 PJ Fanning 2023-05-30 10:50:46 UTC
You could try poi-shared-strings - it has supports keeping the shared strings in a temp file to avoid the memory overhead. https://github.com/pjfanning/poi-shared-strings. SXSSF sample in https://github.com/pjfanning/poi-shared-strings-sample