Bug 63774 - Custom properties "add" method is slow
Summary: Custom properties "add" method is slow
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: XSSF (show other bugs)
Version: 4.0.0-FINAL
Hardware: PC Linux
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-09-26 09:09 UTC by serhiy
Modified: 2020-01-08 22:10 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description serhiy 2019-09-26 09:09:30 UTC
Dear developers,

We have recently experienced an issue moving to the latest release. In one of our use cases we are setting a large number of custom properties to an Excel (XLSX) file and it seems that in recent versions something was changed and this process became much slower. Consider the following code:


public class PoiTest {
    public static void main(String[] args) throws IOException {
        try (FileInputStream fis = new FileInputStream("test.xlsx")) {
            XSSFWorkbook workbook = new XSSFWorkbook(fis);

            POIXMLProperties.CustomProperties properties = workbook.getProperties().getCustomProperties();

            IntStream.range(0, 10000).forEach(i -> {
                long ts = System.currentTimeMillis();
                properties.addProperty("property" + i, "value" + 1);
                System.out.println(i + " time taken " + (System.currentTimeMillis() - ts));
            });
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

You will see that time is constantly increasing. The investigation led me to conclude that the cause is the call to 'nextPid()' method in 'CustomProperties' class (defined in 'POIXMLProperties'). If you notice, every time the method is called, it iterates over the whole structure to find out which is the latest pid. In our case, since we do the changes sequentially, we have implemented a workaround in our code base to cache pid, in more general case I am not sure what should be a solution (if any), but performance after some point deteriorates so much that it makes application unusable.

Best regards,
Serhiy.
Comment 1 PJ Fanning 2019-09-26 17:52:26 UTC
Added a solution to track last Pid using r1867597
Comment 2 serhiy 2019-09-26 19:51:25 UTC
Hello, sorry for replying to a resolved issue, but I still have some concerns about the proposed solution. I am not really familiar with XSLSX format and therefore I do not know if "pid" must be unique or not. In case it must be unique, it is maybe better to use AtomicInteger or something similar, as in concurrent code execution there is a chance of two different properties ending up with same pid, otherwise ignore this comment. The second concern I have is "if(contains(name))" in "add" method, according to my experimentation, this method is also slow and leads to a bad performance after reaching around 2000 entries in Custom Properties. Therefore I am wondering if the names should be also cached into a temporary set to improve performance. Sorry once again for bumping the issue. Best regards, Serhiy.
Comment 3 dimatk 2020-01-08 17:34:39 UTC
In which release are these changes?
Comment 4 PJ Fanning 2020-01-08 22:10:00 UTC
should be in 4.1.1