Bug 45104

Summary: Possible Thread Safety Problem in FOP 0.93
Product: Fop - Now in Jira Reporter: John Lulich <lulich.john>
Component: generalAssignee: fop-dev
Status: NEW ---    
Severity: normal CC: anoop.sehdev
Priority: P3    
Version: 0.93   
Target Milestone: ---   
Hardware: Other   
OS: other   

Description John Lulich 2008-05-30 11:02:30 UTC
Please forgive me, I'm a Java developer, not a System Administrator.  Information I'm providing regarding our systems is to the best of my current knowledge.

Production Environment:
Server: IBM WebSphere Application Server v5.1
Platform: IBM S/390
OS: z/OS
FOP 0.93
Java 1.4

Development Environment:
Computer: Intel Core2 Duo CPU E4500@2.20GHz
2.19GHz, 2.98 GB RAM
OS: Windows XP Professional, 2002, SP2
IDE: WebSphere Developer for System Z (WDz) v.7.0.1
Server: WebSphere v5.1 Test Environment

On your site, http://xmlgraphics.apache.org/fop/0.94/embedding.html#multithreading, you write:  <If you encounter any suspicious behaviour, please notify us.>

Here's my suspicious behavior:

We have been running FOP 0.93 since upgrading in March of 2007.  In late February/Early March of 2008, we began receiving a VERY low frequency of reports from users informing us that when they attempt to print PDF documents from our site, they are receiving complete documents for a different customer.  If they close the PDF and generate it again, they get the correct documents.  So far in 2008, we have 15 incidents reported out of roughly 260,000 hits to this process.

We initially assumed that we had a multi-threading problem in our Java code, but after careful review and testing, we determined that is not the case.  We have been running this same Java code with only minor base read changes for nearly three years without similar incident.  Also, our logging process has never captured an error that we could tie to this problem in any way.  Likewise, our general server logs also come up clean.

We have noticed that the majority of these problems have occurred when our mainframe CPU's are getting pushed to (or near) 100%, but some have occurred during timeframes when a spike was not detected (though our tracking tools may not have caught it, since spikes are only logged if they last for a specified amount of time).

Unfortunately, we cannot duplicate this problem at will, either on our mainframe or in our development environment.  Past mainframe load tests have not shown any problems.  The few multi-thread issues we have encountered in code we have written for other projects has been easily found, and we have other programming groups using the same basic Java/JAXB process to gather data and marshall XML--the only difference being that they have switched over to using Crystal Reports to convert the XML to PDF, and this project is the last one still using FOP.  Consequently, we are the only ones running into this problem.

Your statement: "Apache FOP may currently not be completely thread safe" raises some red flags with us.  I'm sorry I can't provide better detail at this time, but as I said, the only evidence of a problem comes in the form of a user's call to tech support, and a faxed or emailed copy of the pdf.

Thanks for your help,
John Lulich
Comment 1 Jeremias Maerki 2008-05-31 02:27:59 UTC
That statement on the website is basically a disclaimer so everyone using FOP in a multi-threaded environment performs some testing and don't just consider themselves on the safe side as this is a complex field. Finding bugs in this area is very difficult especially if you don't have dedicated hardware for continuous testing (and we don't). We are very vigilant to catch any changes that might introduce multi-threading problems. I also do regular tests in this direction. That doesn't mean there cannot be a remaining multi-threading bug somewhere in the codebase. But frankly, I have a very hard time believing that content from one document can end up in the output file of another processing run. I know the whole FOP code base very well and can almost guarantee that such a thing cannot happen as there is no shared data concerning the actual text content between individual processing runs.

If you have about 15 incidents in 260'000 hits, I think it should be possible to do some load testing on your system to provoke the failure. I think tools like JMeter (for load testing) and PDFBox (for text extraction) should help you identify such problems and circle in on the cause. Good luck!

If anyone else his additional thoughts on this, please share them. I don't know how I could do anything about this particular problem if it's a FOP problem in the first place. I know a number of users who use FOP in a heavily multi-threaded way without any problems. John, I'm afraid you probably have to help yourself in this case (meaning: reproduce the problem as the first step to identifying the cause). Good luck!
Comment 2 John Lulich 2008-06-02 07:57:09 UTC
I kinda of assumed that there wouldn't be much help available.  But I wanted to at least put this out there in case anybody might have seen something similar in the past, or in case someone else runs across it in the future.

We have load tested this project many times, to no avail.  Additionally, we identified a few different areas where the data mixup could potentially occur, but those were easily tested under heavy load with no negative results.  It seems like there's some missing piece to the puzzle that we haven't been able to identify yet.  The thinking here is that it might have something to do with how FOP handles things when running on WebSphere on a bogged-down mainframe.  We have occasional CPU spikes, but the majority of these came when we had a configuration mixup that caused our CPU to run at 100% at peak times for a couple weeks.  Outside of that time frame, we've only had a handful of occurrences out of around 230,000 hits.  It's just too few to really get a handle on it.  Besides, our company's direction for PDF generation has shifted away from FOP.  I'm in the process of rewriting this last project now anyway.  I was mostly posting this issue in case others had been experiencing anything similar or if there was just something glaring that you could point out to me.

Thanks for your time.  I truly appreciate your reply.

Best regards--
John Lulich
Comment 3 Anoop Sehdev 2008-07-18 05:39:07 UTC
Hi,

We are using FOP 0.20.5. We have recently started noticing that sometimes PDFs we generate are having problems with font embedding.

Some times the CIDSystemInfo section gets written as follows:

/CIDSystemInfo << /Registry (/CIDSystemInfo << /Registry (Adobe/CIDSystemInfo << /Registry (AdobeAdobe)/Ordering (UCS)/Ordering ()/Ordering (UCSUCS)/Supplement )/Supplement )/Supplement 000 >> >> >>

where as it should be :

/CIDSystemInfo << /Registry (Adobe)/Ordering (UCS)/Supplement 0 >>

It clearly shows that there are threading issues. I am not sure if this issue also exists in latest version of FOP.

Regards
Anoop Sehdev

(In reply to comment #2)
> I kinda of assumed that there wouldn't be much help available.  But I wanted to
> at least put this out there in case anybody might have seen something similar
> in the past, or in case someone else runs across it in the future.
> We have load tested this project many times, to no avail.  Additionally, we
> identified a few different areas where the data mixup could potentially occur,
> but those were easily tested under heavy load with no negative results.  It
> seems like there's some missing piece to the puzzle that we haven't been able
> to identify yet.  The thinking here is that it might have something to do with
> how FOP handles things when running on WebSphere on a bogged-down mainframe. 
> We have occasional CPU spikes, but the majority of these came when we had a
> configuration mixup that caused our CPU to run at 100% at peak times for a
> couple weeks.  Outside of that time frame, we've only had a handful of
> occurrences out of around 230,000 hits.  It's just too few to really get a
> handle on it.  Besides, our company's direction for PDF generation has shifted
> away from FOP.  I'm in the process of rewriting this last project now anyway. 
> I was mostly posting this issue in case others had been experiencing anything
> similar or if there was just something glaring that you could point out to me.
> Thanks for your time.  I truly appreciate your reply.
> Best regards--
> John Lulich

Comment 4 Jeremias Maerki 2008-07-18 06:28:59 UTC
Thanks for reporting that, but the problem is limited to 0.20.5. The code indeed has multi-threading issues:
https://svn.apache.org/viewvc/xmlgraphics/fop/branches/fop-0_20_2-maintain/src/org/apache/fop/pdf/PDFCIDSystemInfo.java?view=markup

Current releases don't have that problem:
https://svn.apache.org/viewvc/xmlgraphics/fop/trunk/src/java/org/apache/fop/pdf/PDFCIDSystemInfo.java?view=markup

(In reply to comment #3)
> Hi,
> 
> We are using FOP 0.20.5. We have recently started noticing that sometimes PDFs
> we generate are having problems with font embedding.
> 
> Some times the CIDSystemInfo section gets written as follows:
> 
> /CIDSystemInfo << /Registry (/CIDSystemInfo << /Registry (Adobe/CIDSystemInfo
> << /Registry (AdobeAdobe)/Ordering (UCS)/Ordering ()/Ordering
> (UCSUCS)/Supplement )/Supplement )/Supplement 000 >> >> >>
> 
> where as it should be :
> 
> /CIDSystemInfo << /Registry (Adobe)/Ordering (UCS)/Supplement 0 >>
> 
> It clearly shows that there are threading issues. I am not sure if this issue
> also exists in latest version of FOP.
> 
> Regards
> Anoop Sehdev
Comment 5 Glenn Adams 2012-04-07 01:42:07 UTC
resetting P2 open bugs to P3 pending further review