Bug 59663 - Please make READING a workbook thread safe
Summary: Please make READING a workbook thread safe
Status: RESOLVED WONTFIX
Alias: None
Product: POI
Classification: Unclassified
Component: POI Overall (show other bugs)
Version: 3.14-FINAL
Hardware: All All
: P2 enhancement (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-06-03 20:04 UTC by m.kurz
Modified: 2023-03-11 09:19 UTC (History)
1 user (show)



Attachments
example (4.22 KB, text/plain)
2016-06-03 20:04 UTC, m.kurz
Details

Note You need to log in before you can comment on or make changes to this bug.
Description m.kurz 2016-06-03 20:04:05 UTC
Created attachment 33910 [details]
example

Preface:
I did read the FAQ - where it says "Accessing the same document in multiple threads will not work."
Also I read the linked discussion (https://mail-archives.apache.org/mod_mbox/poi-user/201109.mbox/%3C1314859350817-4757295.post@n5.nabble.com%3E).
---

In most of the discussions I read about thread safety in POI people talk about creating/writing the same document via different threads. I completely understand that making WRITING thread safe isn't trivial and probably has many many pitfalls (apart from performance implications) so that's why it isn't implemented in POI (right now).

However, what I am wondering is if it would be possible to make POI thread safe when reading the same worksheet via multiple threads in parallel at the same time.
We have quite large Excel files which we have to read-only (including evaluating a lot of cell-formulars, etc.).
Being able to read a workbook with multiple concurrent threads at the same time would speed thing up a lot for us - and probably for other people as well.

For me - as someone who doesn't know the codebase and it's architecture - my first thought was that all that needs to be done is to make some caches thread safe (by using ConcurrentHashMaps instead of normal HashMaps) and maybe some other minor tweaks...

E.g. today, I implemented a small app which tries read a worksheet via multiple threads concurrently - and of course it failed with this exception:
----
Caused by: java.lang.ClassCastException: java.util.HashMap$Node cannot be cast to java.util.HashMap$TreeNode
	at java.util.HashMap$TreeNode.moveRootToFront(HashMap.java:1819)
	at java.util.HashMap$TreeNode.treeify(HashMap.java:1936)
	at java.util.HashMap.treeifyBin(HashMap.java:771)
	at java.util.HashMap.putVal(HashMap.java:643)
	at java.util.HashMap.put(HashMap.java:611)
	at org.apache.poi.ss.formula.PlainCellCache.put(PlainCellCache.java:84)
	at org.apache.poi.ss.formula.EvaluationCache.getPlainValueEntry(EvaluationCache.java:136)
	at org.apache.poi.ss.formula.EvaluationTracker.acceptPlainValueDependency(EvaluationTracker.java:145)
	at org.apache.poi.ss.formula.WorkbookEvaluator.evaluateAny(WorkbookEvaluator.java:242)
	at org.apache.poi.ss.formula.WorkbookEvaluator.evaluateReference(WorkbookEvaluator.java:702)
	at org.apache.poi.ss.formula.SheetRefEvaluator.getEvalForCell(SheetRefEvaluator.java:48)
	at org.apache.poi.ss.formula.SheetRangeEvaluator.getEvalForCell(SheetRangeEvaluator.java:74)
	at org.apache.poi.ss.formula.LazyAreaEval.getRelativeValue(LazyAreaEval.java:51)
	at org.apache.poi.ss.formula.eval.AreaEvalBase.getValue(AreaEvalBase.java:131)
	at org.apache.poi.ss.formula.functions.MultiOperandNumericFunction.collectValues(MultiOperandNumericFunction.java:151)
	at org.apache.poi.ss.formula.functions.MultiOperandNumericFunction.getNumberArray(MultiOperandNumericFunction.java:128)
	at org.apache.poi.ss.formula.functions.MultiOperandNumericFunction.evaluate(MultiOperandNumericFunction.java:90)
	at org.apache.poi.ss.formula.OperationEvaluatorFactory.evaluate(OperationEvaluatorFactory.java:132)
	at org.apache.poi.ss.formula.WorkbookEvaluator.evaluateFormula(WorkbookEvaluator.java:503)
	at org.apache.poi.ss.formula.WorkbookEvaluator.evaluateAny(WorkbookEvaluator.java:263)
	at org.apache.poi.ss.formula.WorkbookEvaluator.evaluateReference(WorkbookEvaluator.java:702)
	at org.apache.poi.ss.formula.SheetRefEvaluator.getEvalForCell(SheetRefEvaluator.java:48)
	at org.apache.poi.ss.formula.SheetRangeEvaluator.getEvalForCell(SheetRangeEvaluator.java:74)
	at org.apache.poi.ss.formula.LazyRefEval.getInnerValueEval(LazyRefEval.java:43)
	at org.apache.poi.ss.formula.eval.OperandResolver.chooseSingleElementFromRef(OperandResolver.java:179)
	at org.apache.poi.ss.formula.eval.OperandResolver.getSingleValue(OperandResolver.java:62)
	at org.apache.poi.ss.formula.eval.TwoOperandNumericOperation.singleOperandEvaluate(TwoOperandNumericOperation.java:29)
	at org.apache.poi.ss.formula.eval.TwoOperandNumericOperation.evaluate(TwoOperandNumericOperation.java:36)
	at org.apache.poi.ss.formula.functions.Fixed2ArgFunction.evaluate(Fixed2ArgFunction.java:33)
	at org.apache.poi.ss.formula.OperationEvaluatorFactory.evaluate(OperationEvaluatorFactory.java:119)
	at org.apache.poi.ss.formula.WorkbookEvaluator.evaluateFormula(WorkbookEvaluator.java:503)
	at org.apache.poi.ss.formula.WorkbookEvaluator.evaluateAny(WorkbookEvaluator.java:263)
	at org.apache.poi.ss.formula.WorkbookEvaluator.evaluateReference(WorkbookEvaluator.java:702)
	at org.apache.poi.ss.formula.SheetRefEvaluator.getEvalForCell(SheetRefEvaluator.java:48)
	at org.apache.poi.ss.formula.SheetRangeEvaluator.getEvalForCell(SheetRangeEvaluator.java:74)
	at org.apache.poi.ss.formula.LazyAreaEval.getRelativeValue(LazyAreaEval.java:51)
	at org.apache.poi.ss.formula.LazyAreaEval.getRelativeValue(LazyAreaEval.java:45)
	at org.apache.poi.ss.formula.eval.AreaEvalBase.getValue(AreaEvalBase.java:128)
	at org.apache.poi.ss.formula.functions.LookupUtils$ColumnVector.getItem(LookupUtils.java:100)
	at org.apache.poi.ss.formula.functions.Vlookup.evaluate(Vlookup.java:59)
	at org.apache.poi.ss.formula.functions.Var3or4ArgFunction.evaluate(Var3or4ArgFunction.java:36)
	at org.apache.poi.ss.formula.OperationEvaluatorFactory.evaluate(OperationEvaluatorFactory.java:132)
	at org.apache.poi.ss.formula.WorkbookEvaluator.evaluateFormula(WorkbookEvaluator.java:503)
	at org.apache.poi.ss.formula.WorkbookEvaluator.evaluateAny(WorkbookEvaluator.java:263)
	at org.apache.poi.ss.formula.WorkbookEvaluator.evaluate(WorkbookEvaluator.java:205)
	at org.apache.poi.hssf.usermodel.HSSFFormulaEvaluator.evaluateFormulaCellValue(HSSFFormulaEvaluator.java:374)
	at org.apache.poi.hssf.usermodel.HSSFFormulaEvaluator.evaluate(HSSFFormulaEvaluator.java:202)
----
This exception could be fixed be making _plainValueEntriesByLoc a ConcurrentHashMap in https://github.com/apache/poi/blob/REL_3_14_FINAL/src/java/org/apache/poi/ss/formula/PlainCellCache.java#L81
I had a quick look in the codebase and it looks like there are some more caches which probably could just be changed to a ConcurrentHashMap...


What do you think?
Is there a chance to make this work? Or am I completly wrong?
Comment 1 Dominik Stadler 2023-03-11 09:19:33 UTC
I don't think there are any plans to do this. 

The sample exception that you list shows the types of problems that we would run into. Any lazily-initialized data-structure would be a potential case of errors. 

Sometimes the error message will be much harder to interpret or there even might be very strange incorrect behavior without any indication that multi-threading access is the culprit.

Therefore I am closing this as WONTFIX for now. Even if someone would come up with some initial patches, I would vote against changing the official stance on multi-threading as such a guarantee would come with an additional maintenance burden that very likely no-one is willing to provide.

Lastly, any such change likely has some performance impact on current single-threaded usages of the code, thus creating other problems for cases where large documents are processed already now.