Bug 57401 - [PATCH] POI SharedStringsTable's MapDB implementation to reduce memory footprint
Summary: [PATCH] POI SharedStringsTable's MapDB implementation to reduce memory footprint
Status: RESOLVED WONTFIX
Alias: None
Product: POI
Classification: Unclassified
Component: SXSSF (show other bugs)
Version: unspecified
Hardware: PC All
: P2 enhancement with 1 vote (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords: PatchAvailable
Depends on:
Blocks: 52484
  Show dependency tree
 
Reported: 2014-12-29 07:29 UTC by rakesh kumar suthar
Modified: 2018-07-09 18:55 UTC (History)
3 users (show)



Attachments
attachment contains additional class and enum with tweek in existing code and test cases , writing part of test case require less then -Xmx100M to write data (32.91 KB, text/plain)
2014-12-29 07:29 UTC, rakesh kumar suthar
Details
patch wiht minor changes (33.40 KB, patch)
2014-12-29 08:32 UTC, rakesh kumar suthar
Details | Diff
[patch-alternate way] attaching patch to Integrating DBMappedSharedStringsTable by using SharedStringsTableType in SXSSFWorkbook (25.99 KB, patch)
2014-12-30 09:06 UTC, rakesh kumar suthar
Details | Diff
heap usage comparison 1,00,000 unique record of 100 char between default SharedStringTable vs SharedStringsTable Using MapDB (69.95 KB, image/png)
2014-12-30 09:52 UTC, rakesh kumar suthar
Details
heap usage comparison 1,00,000 unique record of 1000 char between default SharedStringTable vs SharedStringsTable Using MapDB (90.69 KB, image/png)
2014-12-30 09:54 UTC, rakesh kumar suthar
Details
heap usage comparison 10,00,000 unique record of 1000 char between default SharedStringTable vs SharedStringsTable Using MapDB (302.13 KB, image/png)
2014-12-30 09:55 UTC, rakesh kumar suthar
Details

Note You need to log in before you can comment on or make changes to this bug.
Description rakesh kumar suthar 2014-12-29 07:29:56 UTC
Created attachment 32333 [details]
attachment contains additional class and enum with tweek in existing code and test cases , writing part of test case require less then -Xmx100M to write data

Problem : SXSSFWorkbook defaults to using inline strings instead of a shared strings table.This is very efficient, since no document content needs to be kept in memory, but is also known to produce documents that are incompatible with some clients and work book size will be large.

SXSSFWorkbook with shared strings enabled all unique strings in the document has to be kept in memory but it use a lot more resources than with shared strings disabled.

Solution : To reduce memory footprint of POI’s shared strings table implementation we implemented shared strings table usin MapDB.

Overall, the MapDB solution is slower than pure POI, but takes much lesser amount of memory.


Attached patch 
We couldn't so far find a clean way to achieve this without patching POI code.

To achieve this we have added SharedStringsTable type (Default or MapDB) to use while constructing XSSFWorkbook through it's constructor which and overridden write(OutputStream stream) method from from POIXMLDocument(by removing final keyword from this method to override it).

DBMappedSharedStringsTable class extends from SharedStringsTable which have logic to flows data to disk as per availability of memory.
Mirror of Apache POI
Comment 1 rakesh kumar suthar 2014-12-29 08:29:28 UTC
Comment on attachment 32333 [details]
attachment contains additional class and enum with tweek in existing code and test cases , writing part of test case require less then -Xmx100M to write data

Index: src/ooxml/testcases/org/apache/poi/xssf/model/TestDBMappedSharedStringsTableOption.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
--- src/ooxml/testcases/org/apache/poi/xssf/model/TestDBMappedSharedStringsTableOption.java	(revision )
+++ src/ooxml/testcases/org/apache/poi/xssf/model/TestDBMappedSharedStringsTableOption.java	(revision )
@@ -0,0 +1,206 @@
+package org.apache.poi.xssf.model;
+
+import org.apache.poi.ss.usermodel.Cell;
+import org.apache.poi.ss.usermodel.Row;
+import org.apache.poi.ss.usermodel.Workbook;
+import org.apache.poi.xssf.SXSSFITestDataProvider;
+import org.apache.poi.xssf.streaming.SXSSFSheet;
+import org.apache.poi.xssf.streaming.SXSSFWorkbook;
+import org.apache.poi.xssf.usermodel.XSSFWorkbook;
+import org.junit.After;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.io.*;
+import java.util.Random;
+
+import static org.junit.Assert.assertEquals;
+
+public class TestDBMappedSharedStringsTableOption {
+    //Streaming version of workbook
+    private SXSSFWorkbook workbook;
+
+    private SXSSFSheet sheet;
+
+    private File outputFile;
+    public static final String TEST_OUTPUT_DIR = "poi.test.xssf.output.dir";
+
+    @Before
+    public void setUp() {
+        outputFile = new File(System.getProperty(TEST_OUTPUT_DIR), "output.xlsx");
+        setupWorkBook();
+        setupBlankSheet();
+    }
+
+    private void setupWorkBook() {
+        XSSFWorkbook wb = new XSSFWorkbook(SharedStringsTableType.LOW_FOOTPRINT_MAP_DB_SST);
+        workbook = new SXSSFWorkbook(wb, 2, false, true);
+    }
+
+    private void setupBlankSheet() {
+        sheet = (SXSSFSheet) workbook.createSheet("Employee Data");
+    }
+
+    @After
+    public void cleanup() {
+        outputFile.delete();
+    }
+
+    @Test
+    public void testWrite100UniqueRecordsOf10Char() throws IOException {
+        int recordCount = 100;
+        addUniqueRecordsToSheet(0, 100, 10);
+        writeAndAssertRecord(recordCount);
+    }
+
+    @Test
+    public void testWrite1MUniqueRecordsOf100Char() {
+        int recordCount = 1000000;
+        addUniqueRecordsToSheet(0, recordCount, 100);
+        writeAndAssertRecord(recordCount);
+    }
+
+    @Test
+    public void testWriteFromTextFile() {
+        int recordCount = 3;
+        File textInputFile = new File(System.getProperty(TEST_OUTPUT_DIR), "temp.txt");
+        try {
+            FileWriter w = new FileWriter(textInputFile);
+            for (int i = 1; i <= recordCount; i++) {
+                w.write("Line" + i + ",FirstColumn,SecondColumn,ThirdColumn\r\n");
+            }
+            w.close();
+        } catch (IOException e) {
+        }
+        addRecordsFromFile("temp.txt");
+        writeAndAssertRecord(recordCount);
+        textInputFile.delete();
+    }
+
+    @Test
+    public void testWrite1MRandomRecordsOf10Char() {
+        int recordCount = 100000;
+        addRandomRecordsToSheet(0, recordCount, 200000, 10);
+        writeAndAssertRecord(recordCount);
+    }
+
+    @Test
+    public void test1MRecordHavingRepetitiveRecordsOf10Char() {
+        int recordCount = 1000000;
+        addUniqueRecordsToSheet(0, 200000, 10);
+        addUniqueRecordsToSheet(200000, 200000, 10);
+        addUniqueRecordsToSheet(400000, 200000, 10);
+        addUniqueRecordsToSheet(600000, 200000, 10);
+        addUniqueRecordsToSheet(800000, 200000, 10);
+        writeAndAssertRecord(recordCount);
+    }
+
+    @Test
+    public void testWriteAllDuplicateRecord() {
+        int recordCount = 100000;
+        addRepeatingRecordsToSheet(recordCount);
+        writeAndAssertRecord(recordCount);
+    }
+
+    private void writeAndAssertRecord(int recordCount) {
+        System.out.print("Started writing.....");
+        //NOTE: all tests can be executed within -Xmx100M by commenting out out code below
+        //----
+        XSSFWorkbook wb = (XSSFWorkbook) SXSSFITestDataProvider.instance.writeOutAndReadBack(workbook);
+        System.out.println("File creation done...Asserting");
+        assertRows(wb, recordCount);
+        //----
+    }
+
+    private void addUniqueRecordsToSheet(int fromRowNum, int numberOfRecords, int constantStringLength) {
+        System.out.print("adding records to sheet.....");
+        int i = 0;
+        String constantString = getStringOf(constantStringLength);
+        while (i++ < numberOfRecords) {
+            if (i % 10000 == 0) System.out.print(i + ",");
+            Row row = sheet.createRow(fromRowNum++);
+            Object[] objArr = new Object[]{constantString + i};
+            int cellNum = 0;
+            for (Object obj : objArr) {
+                Cell cell = row.createCell(cellNum++);
+                if (obj instanceof String) {
+                    cell.setCellValue((String) obj);
+                } else if (obj instanceof Integer)
+                    cell.setCellValue((Integer) obj);
+            }
+        }
+    }
+
+    private String getStringOf(int length) {
+        StringBuilder str = new StringBuilder();
+        for (int j = 0; j < length; j++) {
+            str.append("a");
+        }
+        return str.toString();
+    }
+
+    private void addRandomRecordsToSheet(int fromRowNum, int numberOfRecords, int recordLength, int constantStringLength) {
+        int i = 0;
+        String constantString = getStringOf(constantStringLength);
+        while (i++ < numberOfRecords) {
+            if (i % 1000 == 0) System.out.print(i + ",");
+            Row row = sheet.createRow(fromRowNum++);
+            Object[] objArr = new Object[]{constantString + new Random().nextInt(recordLength)};
+            int cellNum = 0;
+            for (Object obj : objArr) {
+                Cell cell = row.createCell(cellNum++);
+                if (obj instanceof String)
+                    cell.setCellValue((String) obj);
+                else if (obj instanceof Integer)
+                    cell.setCellValue((Integer) obj);
+            }
+        }
+    }
+
+    private void addRecordsFromFile(String fileName) {
+        System.out.print("adding records to sheet.....");
+        try {
+            int fromRowNum = 0;
+            BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(fileName)));
+            String line = null;
+            while ((line = br.readLine()) != null) {
+                Row row = sheet.createRow(fromRowNum++);
+                Object[] objArr = line.split(",");
+                int cellNum = 0;
+                for (Object obj : objArr) {
+                    Cell cell = row.createCell(cellNum++);
+                    if (obj instanceof String)
+                        cell.setCellValue((String) obj);
+                    else if (obj instanceof Integer)
+                        cell.setCellValue((Integer) obj);
+                }
+            }
+            br.close();
+        } catch (Exception e) {
+            e.printStackTrace();
+        }
+    }
+
+    private void addRepeatingRecordsToSheet(int count) {
+        int rownum = 0;
+        int i = 0;
+        String constantString = getStringOf(10);
+        while (i++ < count) {
+            Row row = sheet.createRow(rownum++);
+            Object[] objArr = new Object[]{constantString};
+            int cellnum = 0;
+            for (Object obj : objArr) {
+                Cell cell = row.createCell(cellnum++);
+                if (obj instanceof String)
+                    cell.setCellValue((String) obj);
+                else if (obj instanceof Integer)
+                    cell.setCellValue((Integer) obj);
+            }
+        }
+    }
+
+    public void assertRows(Workbook wb, int expectedRecordCount) {
+        assertEquals(expectedRecordCount, wb.getSheetAt(0).getLastRowNum() + 1);
+    }
+
+}
Index: src/ooxml/java/org/apache/poi/xssf/model/SharedStringsTableType.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
--- src/ooxml/java/org/apache/poi/xssf/model/SharedStringsTableType.java	(revision )
+++ src/ooxml/java/org/apache/poi/xssf/model/SharedStringsTableType.java	(revision )
@@ -0,0 +1,37 @@
+/* ====================================================================
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+==================================================================== */
+package org.apache.poi.xssf.model;
+
+/**
+ * enum to specify shared strings table to use
+ */
+public enum SharedStringsTableType {
+    DEFAULT_SST(SharedStringsTable.class),//in memory shared strings string table
+    LOW_FOOTPRINT_MAP_DB_SST(DBMappedSharedStringsTable.class); //streaming version low foot print shared strings table
+    /**
+     * Defines what object is used to construct instances of this relationship
+     */
+    private Class<? extends SharedStringsTable> instance;
+
+    private SharedStringsTableType(Class<? extends SharedStringsTable> sharedStringsTableInstance) {
+        instance = sharedStringsTableInstance;
+    }
+
+    public Class<? extends SharedStringsTable> getInstance() {
+        return instance;
+    }
+}
Index: src/ooxml/java/org/apache/poi/POIXMLDocument.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
--- src/ooxml/java/org/apache/poi/POIXMLDocument.java	(revision 8bf3ebfcee2e2799486de4ed8a5e063616b4083a)
+++ src/ooxml/java/org/apache/poi/POIXMLDocument.java	(revision )
@@ -189,7 +189,7 @@
      *
      * @exception IOException if anything can't be written.
      */
-    public final void write(OutputStream stream) throws IOException {
+    public void write(OutputStream stream) throws IOException {
         //force all children to commit their changes into the underlying OOXML Package
         Set<PackagePart> context = new HashSet<PackagePart>();
         onSave(context);
Index: src/ooxml/java/org/apache/poi/xssf/model/DBMappedSharedStringsTable.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
--- src/ooxml/java/org/apache/poi/xssf/model/DBMappedSharedStringsTable.java	(revision )
+++ src/ooxml/java/org/apache/poi/xssf/model/DBMappedSharedStringsTable.java	(revision )
@@ -0,0 +1,311 @@
+/* ====================================================================
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+==================================================================== */
+
+package org.apache.poi.xssf.model;
+
+import org.apache.poi.openxml4j.opc.PackagePart;
+import org.apache.poi.openxml4j.opc.PackageRelationship;
+import org.apache.poi.util.TempFile;
+import org.apache.xmlbeans.XmlException;
+import org.apache.xmlbeans.XmlOptions;
+import org.mapdb.DB;
+import org.mapdb.DBMaker;
+import org.mapdb.HTreeMap;
+import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTRst;
+import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTSst;
+import org.openxmlformats.schemas.spreadsheetml.x2006.main.SstDocument;
+
+import javax.xml.stream.XMLStreamException;
+import java.io.*;
+import java.math.BigInteger;
+import java.security.SecureRandom;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+
+/**
+ * SharedStringsTable With Map DB implementation
+ * </p>
+ *
+ */
+public class DBMappedSharedStringsTable extends SharedStringsTable implements AutoCloseable{
+
+    /**
+     * Maps strings and their indexes in the <code>recordVsIndexBasedSTMap</code> map db
+     */
+    private DB recordVsIndexMapDB;
+    private HTreeMap<String, Integer> recordVsIndexBasedSTMap; //string vs index map to lookup existing record in stTable
+    /**
+     * Maps strings and their indexes in the <code>recordVsIndexBasedSTMap</code> map db
+     */
+    private DB indexVsRecordMapDB;
+    private HTreeMap<Integer, String> indexVsRecordBasedSTMap; //index vs string map to retrieve record with index
+
+    private final File temp_shared_string_file;
+
+    /**
+     * An integer representing the total count of strings in the workbook. This count does not
+     * include any numbers, it counts only the total of text strings in the workbook.
+     */
+    private int count;
+
+    /**
+     * An integer representing the total count of unique strings in the Shared String Table.
+     * A string is unique even if it is a copy of another string, but has different formatting applied
+     * at the character level.
+     */
+    private int uniqueCount;
+
+    private SstDocument _sstDoc;
+
+    private final static XmlOptions options = new XmlOptions();
+    private final static XmlOptions out_options = new XmlOptions();
+
+
+    static {
+        options.put(XmlOptions.SAVE_INNER);
+        options.put(XmlOptions.SAVE_AGGRESSIVE_NAMESPACES);
+        options.put(XmlOptions.SAVE_USE_DEFAULT_NAMESPACE);
+        options.setSaveImplicitNamespaces(Collections.singletonMap("", "http://schemas.openxmlformats.org/spreadsheetml/2006/main"));
+
+        out_options.setLoadSubstituteNamespaces(Collections.singletonMap("", "http://schemas.openxmlformats.org/spreadsheetml/2006/main"));   //TODO add options if required
+    }
+
+    public DBMappedSharedStringsTable() {
+        super();
+        temp_shared_string_file = createTempFile("poi-shared-string-table", ".xml");
+        initMapDbBasedSharedStringTableMap();
+    }
+
+    private File createTempFile(String prefix, String suffix) {
+        try {
+            return TempFile.createTempFile(prefix, suffix);
+        } catch (IOException e) {
+            throw new RuntimeException("Couldn't create required temp file", e);
+        }
+    }
+
+    public DBMappedSharedStringsTable(PackagePart part, PackageRelationship rel) throws IOException {
+        super(part, rel);//TODO needs to be commented out whiler reading
+        temp_shared_string_file = createTempFile("poi-shared-string-table", ".xml");
+        initMapDbBasedSharedStringTableMap();
+        readFrom(part.getInputStream());
+    }
+
+    public FileInputStream getSharedStringInputStream() throws IOException {
+        return new FileInputStream(temp_shared_string_file);
+    }
+
+    public FileOutputStream getSharedStringsTableOutputStream() throws IOException {
+        return new FileOutputStream(temp_shared_string_file);
+    }
+
+    public File getTemp_shared_string_file() {
+        return temp_shared_string_file;
+    }
+
+    private void initMapDbBasedSharedStringTableMap() {
+        initRecordVsIndexBasedMapDB();
+        initIndexVsRecordBasedMapDB();
+    }
+
+    private void initRecordVsIndexBasedMapDB() {
+        File mapDbFile = createTempFile(new BigInteger(130, new SecureRandom()).toString(32), "");//creating random name file to store map db
+        recordVsIndexMapDB = DBMaker.newFileDB(mapDbFile)
+                .transactionDisable()
+                .cacheHardRefEnable()
+                .cacheSize(65536)
+                .deleteFilesAfterClose()
+                .mmapFileEnablePartial()
+                .closeOnJvmShutdown().make();
+        recordVsIndexBasedSTMap = recordVsIndexMapDB.createHashMap(new BigInteger(130, new SecureRandom()).toString(32)).make();
+    }
+
+    private void initIndexVsRecordBasedMapDB() {
+        File mapDb2File = createTempFile(new BigInteger(130, new SecureRandom()).toString(32), "");//creating random name file to store map db
+        indexVsRecordMapDB = DBMaker.newFileDB(mapDb2File)
+                .transactionDisable()
+                .cacheDisable() //caching not required indexVsRecordBasedSTMap will be used to write all existing values
+                .deleteFilesAfterClose()
+                .mmapFileEnablePartial()
+                .closeOnJvmShutdown().make();
+        indexVsRecordBasedSTMap = indexVsRecordMapDB.createHashMap(new BigInteger(130, new SecureRandom()).toString(32)).make();
+    }
+
+    /**
+     * Read this shared strings table from an XML file.
+     *
+     * @param is The input stream containing the XML document.
+     * @throws java.io.IOException if an error occurs while reading.
+     */
+    @SuppressWarnings("deprecation") //YK: getXYZArray() array accessors are deprecated in xmlbeans with JDK 1.5 support
+    public void readFrom(InputStream is) throws IOException {
+        try {
+            int cnt = 0;
+            _sstDoc = SstDocument.Factory.parse(is);
+            CTSst sst = _sstDoc.getSst();
+            count = (int) sst.getCount();
+            uniqueCount = (int) sst.getUniqueCount();
+            for (CTRst st : sst.getSiArray()) {
+                String key = getKey(st);
+                recordVsIndexBasedSTMap.put(key, cnt);
+                indexVsRecordBasedSTMap.put(cnt, key);
+                cnt++;
+            }
+        } catch (XmlException e) {
+            throw new IOException(e.getLocalizedMessage());
+        }
+    }
+
+    private String getKey(CTRst st) {
+        return st.xmlText(options);
+    }
+
+    /**
+     * Return a string item by index
+     *
+     * @param idx index of item to return.
+     * @return the item at the specified position in this Shared String table.
+     */
+    public CTRst getEntryAt(int idx) {
+        try {
+            return CTRst.Factory.parse(indexVsRecordBasedSTMap.get(idx), out_options);
+        } catch (XmlException e) {
+            throw new RuntimeException("Error Parsing xmlText from SSTable");
+        }
+    }
+
+    /**
+     * Return an integer representing the total count of strings in the workbook. This count does not
+     * include any numbers, it counts only the total of text strings in the workbook.
+     *
+     * @return the total count of strings in the workbook
+     */
+    public int getCount() {
+        return count;
+    }
+
+    /**
+     * Returns an integer representing the total count of unique strings in the Shared String Table.
+     * A string is unique even if it is a copy of another string, but has different formatting applied
+     * at the character level.
+     *
+     * @return the total count of unique strings in the workbook
+     */
+    public int getUniqueCount() {
+        return uniqueCount;
+    }
+
+    /**
+     * Add an entry to this Shared String table (a new value is appened to the end).
+     * <p/>
+     * <p>
+     * If the Shared String table already contains this <code>CTRst</code> bean, its index is returned.
+     * Otherwise a new entry is aded.
+     * </p>
+     *
+     * @param st the entry to add
+     * @return index the index of added entry
+     */
+    public int addEntry(CTRst st) {
+        String s = getKey(st);
+        count++;
+        if (recordVsIndexBasedSTMap.containsKey(s)) {
+            return recordVsIndexBasedSTMap.get(s);
+        }
+        //new unique record
+        recordVsIndexBasedSTMap.put(s, uniqueCount);
+        indexVsRecordBasedSTMap.put(uniqueCount, s);
+        return uniqueCount++;
+    }
+    /**
+     * Provide low-level access to the underlying array of CTRst beans
+     *
+     * @return array of CTRst beans
+     */
+    public List<CTRst> getItems() {
+        List<CTRst> beans = new ArrayList<CTRst>();
+        for (int i = 0; i < uniqueCount; i++) {
+            beans.add(getEntryAt(i));
+        }
+        return beans;
+    }
+
+    /**
+     * Write this table out as XML.
+     *
+     * @param out The stream to write to.
+     * @throws java.io.IOException if an error occurs while writing.
+     */
+    public void writeTo(OutputStream out) throws IOException {
+        //re-create the sst table every time saving a workbook at the end after adding all record using map DB
+        try {
+            Writer writer = new BufferedWriter(new OutputStreamWriter(out, "UTF-8"));
+            addDefaultXmlOptions(writer);
+            if (uniqueCount != 0) {
+                addStringItems(writer);
+                addEndDocument(writer);
+            }
+            writer.flush();
+        } catch (XMLStreamException e) {
+            throw new RuntimeException("Couldn't write to SharedStringsTable", e);
+        }
+    }
+
+    private void addDefaultXmlOptions(Writer writer) throws XMLStreamException, IOException {
+        writer.write("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\r\n");
+        String isNoSIElements = uniqueCount == 0 ? "/" : "";
+        writer.write("<sst count=\"" + count + "\" uniqueCount=\"" + uniqueCount + "\" xmlns=\"http://schemas.openxmlformats.org/spreadsheetml/2006/main\"" + isNoSIElements + ">");
+    }
+
+    private void addStringItems(Writer writer) throws XMLStreamException, IOException {
+        for (int i = 0; i < uniqueCount; i++) {
+            String s = indexVsRecordBasedSTMap.get(i);
+            writer.write("<si>");
+            writer.write(s);
+            writer.write("</si>");
+        }
+    }
+
+    private void addEndDocument(Writer writer) throws XMLStreamException, IOException {
+        writer.write("</sst>");
+    }
+
+    @Override
+    protected void commit() throws IOException {
+       // createDefaultSSTTableXml();
+        FileOutputStream sharedStringOutputStream = getSharedStringsTableOutputStream();
+        writeTo(sharedStringOutputStream);
+        sharedStringOutputStream.close();
+    }
+
+    private void createDefaultSSTTableXml() throws IOException {         //Todo, check if needed to create default one
+        _sstDoc = SstDocument.Factory.newInstance();
+        PackagePart part = getPackagePart();
+        OutputStream out = part.getOutputStream();
+        _sstDoc.save(out, options);
+        out.close();
+    }
+
+    @Override
+    public void close() throws Exception {
+        recordVsIndexBasedSTMap.clear();
+        indexVsRecordBasedSTMap.clear();
+        recordVsIndexMapDB.close();
+        indexVsRecordMapDB.close();
+    }
+}
Index: src/ooxml/java/org/apache/poi/xssf/usermodel/XSSFWorkbook.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
--- src/ooxml/java/org/apache/poi/xssf/usermodel/XSSFWorkbook.java	(revision 8bf3ebfcee2e2799486de4ed8a5e063616b4083a)
+++ src/ooxml/java/org/apache/poi/xssf/usermodel/XSSFWorkbook.java	(revision )
@@ -20,27 +20,17 @@
 import static org.apache.poi.xssf.usermodel.helpers.XSSFPaswordHelper.setPassword;
 import static org.apache.poi.xssf.usermodel.helpers.XSSFPaswordHelper.validatePassword;
 
-import java.io.ByteArrayInputStream;
-import java.io.ByteArrayOutputStream;
-import java.io.File;
-import java.io.IOException;
-import java.io.InputStream;
-import java.io.OutputStream;
-import java.util.ArrayList;
-import java.util.Collection;
-import java.util.HashMap;
-import java.util.Iterator;
-import java.util.LinkedList;
-import java.util.List;
-import java.util.Map;
+import java.io.*;
+import java.lang.reflect.Constructor;
+import java.util.*;
 import java.util.regex.Pattern;
+import java.util.zip.ZipEntry;
+import java.util.zip.ZipFile;
+import java.util.zip.ZipOutputStream;
 
 import javax.xml.namespace.QName;
 
-import org.apache.poi.POIXMLDocument;
-import org.apache.poi.POIXMLDocumentPart;
-import org.apache.poi.POIXMLException;
-import org.apache.poi.POIXMLProperties;
+import org.apache.poi.*;
 import org.apache.poi.openxml4j.exceptions.InvalidFormatException;
 import org.apache.poi.openxml4j.exceptions.OpenXML4JException;
 import org.apache.poi.openxml4j.opc.OPCPackage;
@@ -61,19 +51,9 @@
 import org.apache.poi.ss.util.CellRangeAddress;
 import org.apache.poi.ss.util.CellReference;
 import org.apache.poi.ss.util.WorkbookUtil;
-import org.apache.poi.util.Beta;
-import org.apache.poi.util.IOUtils;
-import org.apache.poi.util.Internal;
-import org.apache.poi.util.POILogFactory;
-import org.apache.poi.util.POILogger;
-import org.apache.poi.util.PackageHelper;
+import org.apache.poi.util.*;
 import org.apache.poi.xssf.XLSBUnsupportedException;
-import org.apache.poi.xssf.model.CalculationChain;
-import org.apache.poi.xssf.model.ExternalLinksTable;
-import org.apache.poi.xssf.model.MapInfo;
-import org.apache.poi.xssf.model.SharedStringsTable;
-import org.apache.poi.xssf.model.StylesTable;
-import org.apache.poi.xssf.model.ThemesTable;
+import org.apache.poi.xssf.model.*;
 import org.apache.poi.xssf.usermodel.helpers.XSSFFormulaUtils;
 import org.apache.xmlbeans.XmlException;
 import org.apache.xmlbeans.XmlObject;
@@ -147,6 +127,11 @@
     private SharedStringsTable sharedStringSource;
 
     /**
+     * shared strings table type- to specify use default or map db shared strings table source
+     */
+    private SharedStringsTableType sharedStringsTableType = SharedStringsTableType.DEFAULT_SST;
+
+    /**
      * A collection of shared objects used for styling content,
      * e.g. fonts, cell styles, colors, etc.
      */
@@ -218,12 +203,22 @@
         onWorkbookCreate();
     }
 
+    public XSSFWorkbook(SharedStringsTableType sharedStringsTableType) {
+        super(newPackage());
+        this.sharedStringsTableType = sharedStringsTableType;
+        onWorkbookCreate();
+    }
+
+    public SharedStringsTableType getSharedStringsTableType() {
+        return sharedStringsTableType;
+    }
+
     /**
      * Constructs a XSSFWorkbook object given a OpenXML4J <code>Package</code> object,
      *  see <a href="http://poi.apache.org/oxml4j/">http://poi.apache.org/oxml4j/</a>.
      * 
      * <p>Once you have finished working with the Workbook, you should close the package
-     * by calling either {@link #close()} or {@link OPCPackage#close()}, to avoid 
+     * by calling either {@link #close()} or {@link OPCPackage#close()}, to avoid
      * leaving file handles open.
      * 
      * <p>Creating a XSSFWorkbook from a file-backed OPC Package has a lower memory
@@ -338,7 +333,7 @@
 
             if (sharedStringSource == null) {
                 // Create SST if it is missing
-                sharedStringSource = (SharedStringsTable)createRelationship(XSSFRelation.SHARED_STRINGS, XSSFFactory.getInstance());
+                sharedStringSource = createSSTSourceBasedOnSSTType();
             }
             
             // Load individual sheets. The order of sheets is defined by the order
@@ -376,6 +371,73 @@
         }
     }
 
+    @Override
+    public void write(OutputStream stream) throws IOException {
+        if (getSharedStringsTableType() == SharedStringsTableType.DEFAULT_SST) {
+            super.write(stream);
+        } else {
+            writeWithPatchingMDBSST(stream);
+        }
+    }
+
+    public void writeWithPatchingMDBSST(OutputStream stream) throws IOException {
+        //Save the template
+        File tmplFile = TempFile.createTempFile("poi-sxssf-template", ".xlsx");
+        try {
+            FileOutputStream os = new FileOutputStream(tmplFile);
+            try {
+                super.write(os);
+            } finally {
+                os.close();
+            }
+
+            //Substitute the template shared string xml with the temporarily generated xml data file
+            injectSharedStringTableXml(tmplFile, stream);
+        } finally {
+            if (!tmplFile.delete()) {
+                throw new IOException("Could not delete temporary file after processing: " + tmplFile);
+            }
+        }
+    }
+
+    private void injectSharedStringTableXml(File zipfile, OutputStream out) throws IOException {
+        ZipFile zip = new ZipFile(zipfile);
+        DBMappedSharedStringsTable _sst = (DBMappedSharedStringsTable) sharedStringSource;
+        try {
+            ZipOutputStream zos = new ZipOutputStream(out);
+            try {
+                Enumeration<? extends ZipEntry> en = zip.entries();
+                while (en.hasMoreElements()) {
+                    ZipEntry ze = en.nextElement();
+                    zos.putNextEntry(new ZipEntry(ze.getName()));
+                    InputStream is;
+                    if (ze.getName().equals("xl/sharedStrings.xml")) {
+                        is = _sst.getSharedStringInputStream(); //injecting shared string table in target output
+                    } else {
+                        is = zip.getInputStream(ze);
+                    }
+                    copyStream(is, zos);
+                    is.close();
+                }
+            } finally {
+                zos.close();
+                if (!_sst.getTemp_shared_string_file().delete()) {
+                    throw new RuntimeException("Couldn't delete temporary shared strings table file.");
+                }
+            }
+        } finally {
+            zip.close();
+        }
+    }
+
+    private static void copyStream(InputStream in, OutputStream out) throws IOException {
+        byte[] chunk = new byte[1024];
+        int count;
+        while ((count = in.read(chunk)) >= 0) {
+            out.write(chunk, 0, count);
+        }
+    }
+
     /**
      * Create a new CTWorkbook with all values set to default
      */
@@ -394,12 +456,38 @@
         POIXMLProperties.ExtendedProperties expProps = getProperties().getExtendedProperties();
         expProps.getUnderlyingProperties().setApplication(DOCUMENT_CREATOR);
 
-        sharedStringSource = (SharedStringsTable)createRelationship(XSSFRelation.SHARED_STRINGS, XSSFFactory.getInstance());
+        sharedStringSource = createSSTSourceBasedOnSSTType();
-        stylesSource = (StylesTable)createRelationship(XSSFRelation.STYLES, XSSFFactory.getInstance());
+        stylesSource = (StylesTable) createRelationship(XSSFRelation.STYLES, XSSFFactory.getInstance());
 
         namedRanges = new ArrayList<XSSFName>();
         sheets = new ArrayList<XSSFSheet>();
         pivotTables = new ArrayList<XSSFPivotTable>();
+    }
+
+    private SharedStringsTable createSSTSourceBasedOnSSTType() {
+        return (SharedStringsTable) createRelationship(XSSFRelation.SHARED_STRINGS, new POIXMLFactory() {
+            @Override
+            public POIXMLDocumentPart createDocumentPart(POIXMLDocumentPart parent, PackageRelationship rel, PackagePart part) {
+                try {
+                    Class<? extends POIXMLDocumentPart> cls = sharedStringsTableType.getInstance();
+                    Constructor<? extends POIXMLDocumentPart> constructor = cls.getDeclaredConstructor(PackagePart.class, PackageRelationship.class);
+                    return constructor.newInstance(part, rel);
+                } catch (Exception e) {
+                    throw new POIXMLException(e);
+                }
+            }
+
+            @Override
+            public POIXMLDocumentPart newDocumentPart(POIXMLRelation descriptor) {
+                try {
+                    Class<? extends POIXMLDocumentPart> cls = sharedStringsTableType.getInstance();
+                    Constructor<? extends POIXMLDocumentPart> constructor = cls.getDeclaredConstructor();
+                    return constructor.newInstance();
+                } catch (Exception e) {
+                    throw new POIXMLException(e);
+                }
+            }
+        });
     }
 
     /**
Index: maven/poi-ooxml.pom
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
--- maven/poi-ooxml.pom	(revision 8bf3ebfcee2e2799486de4ed8a5e063616b4083a)
+++ maven/poi-ooxml.pom	(revision )
@@ -69,5 +69,10 @@
       <artifactId>poi-ooxml-schemas</artifactId>
       <version>@VERSION@</version>
     </dependency>
+    <dependency>
+      <groupId>org.mapdb</groupId>
+      <artifactId>mapdb</artifactId>
+      <version>1.0.6</version>
+      </dependency>
   </dependencies>
 </project>
Comment 2 rakesh kumar suthar 2014-12-29 08:32:20 UTC
Created attachment 32334 [details]
patch wiht minor changes
Comment 3 rakesh kumar suthar 2014-12-30 09:06:23 UTC
Created attachment 32335 [details]
[patch-alternate way] attaching patch to Integrating DBMappedSharedStringsTable by using SharedStringsTableType in SXSSFWorkbook

In earlier submitted path using DBMappedSharedStringsTable option (which provide use flexibility to use DBMapped SharedStringsTable) was included in XSSSFWorkbook.

I am attaching alternate solution to Integrate DBMappedSharedStringsTable option by using SharedStringsTableType enum in SXSSFWorkbook where user can specify whether to use default or custom SharedStringsTable soucre. 
This seems more cleaner and appropriate way to integrate it.

Thanks,
Rakesh Kumar Suthar
Comment 4 rakesh kumar suthar 2014-12-30 09:52:20 UTC
Created attachment 32336 [details]
heap usage comparison 1,00,000 unique record of 100 char between default SharedStringTable vs SharedStringsTable Using MapDB
Comment 5 rakesh kumar suthar 2014-12-30 09:54:05 UTC
Created attachment 32337 [details]
heap usage comparison 1,00,000 unique record of 1000 char between default SharedStringTable vs SharedStringsTable Using MapDB
Comment 6 rakesh kumar suthar 2014-12-30 09:55:12 UTC
Created attachment 32338 [details]
heap usage comparison 10,00,000 unique record of 1000 char between default SharedStringTable vs SharedStringsTable Using MapDB
Comment 7 Javen O'Neal 2015-11-28 20:13:55 UTC
Sounds like a cool implementation, though it would add another dependency to the project. It could be bundled into the poi-ooxml jar or could be required as an external dependency.

It might be worth providing all 3 options to the user: internal shared strings table, inline, MappedDB, with inline being default for SXSSFWorkbooks.
Comment 8 PJ Fanning 2018-07-09 18:55:59 UTC
I think there is a reluctance to introduce new jar dependencies.
I intend to release https://github.com/pjfanning/poi-shared-strings as a standalone project when POI 4.0.0 is released.