Bug 44358

Summary: OufOfMem
Product: Fop - Now in Jira Reporter: Mihail Laftchiyski <mihail>
Component: pdfAssignee: fop-dev
Status: NEW ---    
Severity: normal    
Priority: P3    
Version: 0.93   
Target Milestone: ---   
Hardware: All   
OS: All   

Description Mihail Laftchiyski 2008-02-05 10:45:01 UTC
When testing fop.war application, we run out of memory while generating large 
pdf - 560 pages.  Environment - WAS 5.1.1.13  JRE 1.4.2 (SR9). After the first 
run we observe that large portion of the memory does not get GC. The heap  
analisys shows the following hirarchy holds 64% of the mem 
PrimaryGridUnit
  TableCellLayoutManager
    TableLayoutManager
      TableContentLayout
        TableRowIterator
Comment 1 Andreas L. Delmelle 2008-02-05 10:55:30 UTC
Are you at liberty to try out more recent versions of FOP (0.94 or even FOP Trunk), and see if that gets you 
any further? Some enhancements have been made, and the table-related code in FOP Trunk has changed 
a lot. It would be much appreciated if you could report back on that.

Apart from that: Is it possible to alter the structure of the input FO? From what you're telling, it seems like 
your FO document consists of one gigantic table (or at least contains a page-sequence with a table that 
spans a lot of pages). Correct? If so, you may be out of luck, but that depends on whether you really need 
to have the table completely in one page-sequence...
Comment 2 Mihail Laftchiyski 2008-02-05 11:11:16 UTC
Thanks for the suggestion. 

We tried  fop 0.94 with the xml/xsl that we use to test 0.93 -- unfirtunatelly 
we could not get it working. It appears to be sitting in an endless loop and 
does not produce any output pdf.

Could you please advise what is the status of the FOP Trunk code.  Thanks!


Comment 3 Mihail Laftchiyski 2008-02-05 12:28:54 UTC
A few more comments with the hope they will clarify the situation. Any help 
and/or suggestions will be highly appreciated.  


1) We have tried FOP 0.94. The stylesheet that runs fine with 0.93 runs into an 
infinite loop with 0.94. I'm not sure why this happens, but when I searched on 
google, I found that other users have run into this problem as well. I haven't 
been able to find a solution to this. 

2) The FO stylesheet does have a page sequence with a table that can span a lot 
of pages based on the data. 

Could please let us know why this is a problem? If this is a known problem, can 
you let us know if there are any work arounds or fixes to get by this? I would 
appreciate if you can send us some examples with alternate implementations so 
we can try that out. 
Comment 4 Chris Bowditch 2008-02-06 01:11:08 UTC
1) There was a bug in the Layout code that meant an infinite loop could be 
entered for certain FO. IIRC, preserve-whitespace property had something to do 
with it. This has been fixed in the Trunk code.
2) long page-sequences are a problem for FOP because FOP uses a total-fit 
algorithm to layout the FO. This has the advantage of achieving more elegant 
layout but the disadvantage of requiring the whole page-sequence to be kept in 
memory and it is only released when the whole page-sequence has been finished. 
I think forward references via fo:page-number-cititation can also cause FOP to 
hang onto more objects in memory until the forward reference is resolved.
Comment 5 Mihail Laftchiyski 2008-02-06 09:18:48 UTC
Thanks for the response. 

We are using preserve-whitespace property in our stylesheets. That could be
causing the infinite loop. We don't want to go to production with Trunk code.
But, if you could please point out to the classes from the Trunk code that fixed
this problem, I can get those and include them in the 0.94 jar and run our
stylesheets. Let us know. 

(In reply to comment #4)
> 1) There was a bug in the Layout code that meant an infinite loop could be 
> entered for certain FO. IIRC, preserve-whitespace property had something to do 
> with it. This has been fixed in the Trunk code.
> 2) long page-sequences are a problem for FOP because FOP uses a total-fit 
> algorithm to layout the FO. This has the advantage of achieving more elegant 
> layout but the disadvantage of requiring the whole page-sequence to be kept in 
> memory and it is only released when the whole page-sequence has been finished. 
> I think forward references via fo:page-number-cititation can also cause FOP to 
> hang onto more objects in memory until the forward reference is resolved.
Comment 6 Vincent Hennebert 2008-02-07 03:23:37 UTC
(In reply to comment #5)
> Thanks for the response. 
> 
> We are using preserve-whitespace property in our stylesheets. That could be
> causing the infinite loop. We don't want to go to production with Trunk code.
> But, if you could please point out to the classes from the Trunk code that fixed
> this problem, I can get those and include them in the 0.94 jar and run our
> stylesheets. Let us know. 

Edit the src/java/org/apache/fop/layoutmgr/table/TableContentLayoutManager.java
file and delete lines 145 and 166, which correspond to the following lines of code:
    ElementListUtils.removeLegalBreaks(this.headerList);
    ElementListUtils.removeLegalBreaks(this.footerList);
They are causing the infinite loop. Removing them is totally safe and won't
affect the quality of the layout.

Vincent
> 
> (In reply to comment #4)
> > 1) There was a bug in the Layout code that meant an infinite loop could be 
> > entered for certain FO. IIRC, preserve-whitespace property had something to do 
> > with it. This has been fixed in the Trunk code.
> > 2) long page-sequences are a problem for FOP because FOP uses a total-fit 
> > algorithm to layout the FO. This has the advantage of achieving more elegant 
> > layout but the disadvantage of requiring the whole page-sequence to be kept in 
> > memory and it is only released when the whole page-sequence has been finished. 
> > I think forward references via fo:page-number-cititation can also cause FOP to 
> > hang onto more objects in memory until the forward reference is resolved.

Comment 7 Mihail Laftchiyski 2008-02-13 07:19:09 UTC
Hi all:

We modified FOP 0.94 as Vincent suggested (thanks!). This resolved the infinite 
loop problem.  However we still get the OutOfMemory exception.  I am wondering 
if the FOP framework holds large amount of memory in static class variables. 
Also I noticed that there are a very large number of fop.fo.StaticPropertyList 
objects created - 35000; each of those objects holds two arrays - size 252.  
This correlates with the heapdump analyses we performed - large number of 
arrays without parent and the GC cannot dispose them. (I am under the 
impression that these objects get allocated in a recursive manner.)
Would it be possible to reduce the number of arrays using static var or object 
cache?  

Thank you all for the prompt response!

Regards
Mihail
    
Comment 8 Andreas L. Delmelle 2008-02-13 11:11:13 UTC
(In reply to comment #7)
> 
> We modified FOP 0.94 as Vincent suggested (thanks!). This resolved the infinite 
> loop problem.  However we still get the OutOfMemory exception.  

OK, thanks a lot for trying this out. Can you judge whether the exception occurred sooner or later than 
with FOP 0.93?

> I am wondering if the FOP framework holds large amount of memory in static class variables.

Not that I'm aware of. There are some static variables in the property classes, but those only serve to 
reduce the footprint. (caches that are shared between different FOs in the same document, or even in 
different documents if they are processed concurrently in the same JVM)

> Also I noticed that there are a very large number of fop.fo.StaticPropertyList 
> objects created - 35000

That means your FO contains 35000 objects, which is not abnormal for larger documents. If those are 
all inside the same page-sequence, there is only little you can do for the moment, apart from making 
sure that such an FO document is never generated in the first place. This could be done by 
restructuring the stylesheet to generate multiple page-sequences, but we realize that this is not always 
possible.

In the old days, those PropertyLists were never released. Being attached to the corresponding FONode 
(hard member reference), they were only released when that FONode was no longer referenced.
Currently, they are more like a window, from which the relevant properties are transferred to the 
FONode during parsing. As soon as the endElement() event occurs for that FONode, the PropertyList 
goes out of scope and should theoretically be 100% garbage-collectable (including the backing arrays)

> ; each of those objects holds two arrays - size 252.  

Yep. 252 is the total number of possible properties.

> This correlates with the heapdump analyses we performed - large number of 
> arrays without parent and the GC cannot dispose them. (I am under the 
> impression that these objects get allocated in a recursive manner.)

Correct, although the number of StaticPropertyLists to which there exists a hard reference will be 
determined not by the number of FOs, but by the nesting level. If you have a document with a 
maximum nesting depth of 10 nodes, then there will be at most 10 StaticPropertyList instances alive at 
any given point during the processing. I've literally seen this with my own eyes during a profiling 
session.

Is it normal that the backing arrays are not collected? I'd think not, but I'm not 100% sure.

Which JVM is used? Are you using Sun's implementation, or an IBM JVM? Is there a way to rule out the 
possibility of the GC algorithm being at fault here? Can you try other Java Runtimes?

> Would it be possible to reduce the number of arrays using static var or object 
> cache?

Not really, I think... The properties themselves are already cached for a large part, i.e. a simple 
FixedLength with a value of "10pt" will be the same instance for all occurrences in the document.
Initially, each FixedLength is a separate instance, but we check immediately whether we already have 
one cached with the same value. If that is the case, then the separate instance exists purely on the 
stack, and is substituted with the cached instance before it is attached/bound to the FONode.
Comment 9 Andreas L. Delmelle 2008-02-14 15:12:35 UTC
Small update:
I've been browsing around, and may have found the possible cause of the arrays not being collected.
Theoretically, it is possible that an implementation of a tracing GC algorithm would still view the arrays 
as reachable if their first element is still strongly reachable... Since this is the default absolute-position 
property, and it is possibly referenced by a significant amount of FObjs.

Maybe you could try something like:
* add an empty protected cleanup() method to org.apache.fop.fo.PropertyList
* add an override for this method to StaticPropertyList, and explicitly null out the first element of the 
member arrays
* in FOTreeBuilder.MainFOHandler.endElement(), inside the if-block right after 
currentFObj.endOfNode(), add currentPropertyList.cleanup() as a first line

Comment 10 Stephan Thesing 2010-01-17 23:21:03 UTC
This problem is still present in FOP trunk version.
As I see it there are several things contributing to the huge memory footprint
 1) If the documents contains forward references (e.g. in the TOC or in 
     a "page X of Y" still footer), then FOP keeps all page sequences around,
     together with the Areas until the references have been resolved
 2) Even when a page has been rendered in a sequence, the FO that generated
     that page is kept around, which is especially bad for tables, as they
      generate quite a lot of objects for borders, etc.
    It should be possible to discard a FO object as soon as all pages it
     contributes areas for have been rendered.
 3) At several places, page-sequences are kept around solely for computing
      things like the number of pages generated so far from them. 
     It suffices to keep the needed information (e.g. number pages) around

On fop-dev a two pass approach has been discussed in order to solve 1):
  In the first pass, no pages are rendered, only layout is done. All unknown
    references are treated as "XXX" (as they are now). The definitions of
    IDs are recorded for the second pass with the corresponding page number.
  In the second pass, the defs from the first pass are used to resolve
   references. When the definitions are encountered again in the second pass,
   it is checked that they correspond to the ones made in the first pass.

For 2) and 3) an analysis and redesign has to be done in order to find all
places, where Area or FO information for a page is kept beyond the point when the page has been rendered.
Comment 11 Glenn Adams 2012-04-07 01:37:26 UTC
resetting severity from major to normal pending further review
Comment 12 Glenn Adams 2012-04-07 01:39:43 UTC
resetting P1 open bugs to P3 pending further review