Bug 46048

Summary: Wrong images used (how to clear image cache?)
Product: Fop - Now in Jira Reporter: M.H. <mhilpert>
Component: imagesAssignee: fop-dev
Status: REOPENED ---    
Severity: normal    
Priority: P3    
Version: 0.95   
Target Milestone: ---   
Hardware: PC   
OS: Windows 2000   
Attachments: Proposed patch against FOP Trunk for URI pre-resolution

Description M.H. 2008-10-21 02:30:36 UTC
The problem: if an (SVG) image with the same file name (but different file path!)  is generated more than 1 times, FOP always uses the (wrong) first image.

We use relative paths in our XSLs to reference SVG images. This worked good in FOP 0.20.5 but doesn't work in FOP 0.95 anymore (well, it works with FOP 0.95 when called via FOP.bat but it doesn't work when called from within a Java app via the FOP Java API). So I wrote a custom URIResolver to change the file name to the current unique file path. This solves the problem of the (so far) not working relative paths for images.

However, if a subsequent document generates the same report with the same file but with different data, FOP doesn't use the newly generated file content but the old image of the first report. I guess the internal image cache doesn't use the resolved image file name but the first generated one.

Example:

1. XSL content:

<svg:image width="170mm" height="120mm" xlink:href="C_PerfRiskCons_M.svg" xmlns:xlink="http://www.w3.org/1999/xlink"/>

2. custom URIResolver changes

  'file:/D:/Tmp/iComps/amc/reports/C_PerfRiskCons_M.svg' 

to

  'file:///D:\Tmp\iComps\amc\reports\dVwIIqKYfobFQDzUFJDQ5Er60ovA0G7YMpAVypnaMhY=\C_PerfRiskCons_M.svg'.

with "dVwIIqKYfobFQDzUFJDQ5Er60ovA0G7YMpAVypnaMhY=" being a unique GUID for each report.

=> it seems FOP 0.95 uses   'file:/D:/Tmp/iComps/amc/reports/C_PerfRiskCons_M.svg' for the image cache which would explain the faulty behaviour.

I tried to work around this by serializing report generation and clearing the image cache before each report, but there is no org.apache.fop.image.FopImageFactory.resetCache() anymore in FOP 0.95 and I didn't find any other resetCache() method in the API.

How can I work around this?
Comment 1 Jeremias Maerki 2008-10-21 02:42:06 UTC
Have you tried the suggested work-around at http://xmlgraphics.apache.org/fop/stable/graphics.html#caching already (adding a unique dummy URL parameter)?
Comment 2 M.H. 2008-10-21 02:51:09 UTC
(In reply to comment #1)
> Have you tried the suggested work-around at
> http://xmlgraphics.apache.org/fop/stable/graphics.html#caching already (adding
> a unique dummy URL parameter)?

No, because I can't set a unique URI in the XSL for each report generation. I try to achieve this with my custom URIResolver. But I noticed that for the second call (i.e. the second report) the custom URIResolver's resolve() is not called!!! This is another hint, that the image cache uses the URI before changed by a custom URIResolver! This would explain the the image cache lokks into the cache with the (non-unique) URI and finds it there and doesn't need to call the custom URIResolver.resolve(). If the cache would use the custom URIResolver's resolved URI, this would probably work. 
Comment 3 Jeremias Maerki 2008-10-21 03:05:35 UTC
Ok. What you describe is all expected and correct behaviour. Obviously, it doesn't cover 100% of all requirements. I must say I'm not ready to believe that it wouldn't be possible to add a unique value for the report. You managed in the URIResolver.

One idea could be to add a set of regular expressions to match URIs that should not be cached. We've also talked about an extension attribute on fo:external-graphic to disable the cache for certain images. But that's all not implemented, yet.

To put you out of you misery ;-) here's the code to clear the image cache:
fopFactory.getImageManager().getCache().clearCache();
Comment 4 Vincent Hennebert 2008-10-21 03:18:08 UTC
Doesn't the cache check for the modification date of file: URIs? Seems like a
natural thing to do.
Comment 5 Jeremias Maerki 2008-10-21 03:23:26 UTC
(In reply to comment #4)
> Doesn't the cache check for the modification date of file: URIs? Seems like a
> natural thing to do.
> 

No. Natural it may be if you only look at file URLs, but not all URLs provide a modification date. And we're actually working with URIs, not URLs, which don't have a modification date. Maybe this can be improved. Experiments welcome.
Comment 6 Vincent Hennebert 2008-10-21 03:31:39 UTC
(In reply to comment #5)
> (In reply to comment #4)
> > Doesn't the cache check for the modification date of file: URIs? Seems like a
> > natural thing to do.
> > 
> 
> No. Natural it may be if you only look at file URLs, but not all URLs provide a
> modification date. And we're actually working with URIs, not URLs, which don't
> have a modification date. Maybe this can be improved. Experiments welcome.

That's why I said /file:/ URIs. Something like:
if (uri.getScheme() == "file") {
    check the modification date of the corresponding file
}
Doesn't seem complicated, but I'm obviously missing the big picture.
Comment 7 M.H. 2008-10-21 03:34:37 UTC
As there is no method to clear the image cache, I now have a working workaround:
iin my custom "FOP" class I introduced a new constructor to create a new FopFactory to finally get rid of all cached images:

-----------------------
    /**
     * Constructor.
     * 
     * Workaround image cache problem: each FopFactory has its own image cache. As a custom URIResolver (to set unique image file names)
     * is not considered by the imageCache (FOP 0.95), the image cache must be cleared. But as there is not such clear method anymore (FOP 0.95)
     * we create a complete new FopFactory.
     * 
     * This constructor should be called with parameter 'true' for serialized FOP calls to avoid image caching problems.
     * 
     * @param newFactory If true, create a new FopFactory and try to copy config values from the former FopFactory and UserAgent.
     */
    public FOP(final boolean newFactory) throws Exception {
        super();

        if (newFactory) {
            final FopFactory ff = FopFactory.newInstance();
            
            final FOUserAgent ua = ff.newFOUserAgent();
            ua.setBaseURL(fopUserAgent.getBaseURL());
            ua.setURIResolver(fopUserAgent.getURIResolver());
            
            ff.setStrictValidation(fopFactory.validateStrictly());
            if (fopFactory.getFontBaseURL() != null) {
                ff.setFontBaseURL(fopFactory.getFontBaseURL());
            }
            ff.setUserConfig(fopFactory.getUserConfig());
            
            fopFactory = ff;
        }
    }//FOP()
-----------------------------------

I guess the problem is the image caching of FOP not taking custom URIResolvers into account.

(Why is this bug "resolved worksforme"? This is a clear bug as I described it ...)

Comment 8 Jeremias Maerki 2008-10-21 04:20:50 UTC
(In reply to comment #7)
> As there is no method to clear the image cache, I now have a working
> workaround:
> iin my custom "FOP" class I introduced a new constructor to create a new
> FopFactory to finally get rid of all cached images:
> 
<snip/>

> I guess the problem is the image caching of FOP not taking custom URIResolvers
> into account.
> 
> (Why is this bug "resolved worksforme"? This is a clear bug as I described it
> ...)
> 

Would you care to look at my reply #3 again? I gave you the code necessary to clean the image cache. Here it is again:
fopFactory.getImageManager().getCache().clearCache();

Comment 9 M.H. 2008-10-21 04:22:26 UTC
(In reply to comment #3)
> To put you out of you misery ;-) here's the code to clear the image cache:
> fopFactory.getImageManager().getCache().clearCache();

Thanks, that did also the trick!

Is there hope to fix this issue when using custom URIResolvers?
Comment 10 Jeremias Maerki 2008-10-21 04:35:22 UTC
(In reply to comment #9)
> (In reply to comment #3)
> > To put you out of you misery ;-) here's the code to clear the image cache:
> > fopFactory.getImageManager().getCache().clearCache();
> 
> Thanks, that did also the trick!
> 
> Is there hope to fix this issue when using custom URIResolvers?
> 

Not the way you thought. A URIResolver returns a JAXP Source object and that can't be cached. It's not even guaranteed that the resulting Source object has a system ID. I wouldn't even know where to start to approach this the way you explained. If there's anything that can be improved then it's either looking at Vincent's proposal about checking the last modified date for file URLs (which would only solve this special case) or bypassing caching for certain URIs as I suggested. The URIResolvers are completely irrelevant for image caching, they just provide access to the actual resource when given a URI.
Comment 11 M.H. 2008-10-21 04:48:06 UTC
(In reply to comment #10)
> The URIResolvers are completely irrelevant for image caching, they
> just provide access to the actual resource when given a URI.

Thanks for this clearing up! Now I can  stop playing around with the URIResolver in hope to get it somehow fixed.

Then I don't understand your comment #3: as I can't write a unique path to the XSL (as the XSL never changes because it is the layout of the report), we use relative paths (as they worked with FOP 0.20.5 Java API flawlessly). These relative paths result in the very same image file name for the same report but other data. The only workaround I see here, is to make an additional XML transformation of the XSL to find such relative paths and replace them with temporary full paths, which is not very elegant.

I wonder, how I can get FOP working to process multiple documents in multiple threads. I guess, the only promising approach so far (FOP 0.95) is, to use new FopFactories and UserAgents for each thread and each report generated. But the note in http://xmlgraphics.apache.org/fop/0.94/embedding.html#multithreading ("Apache FOP may currently not be completely thread safe.") is not very encouraging.
Comment 12 Jeremias Maerki 2008-10-21 05:12:39 UTC
(In reply to comment #11)
> (In reply to comment #10)
> > The URIResolvers are completely irrelevant for image caching, they
> > just provide access to the actual resource when given a URI.
> 
> Thanks for this clearing up! Now I can  stop playing around with the
> URIResolver in hope to get it somehow fixed.
> 
> Then I don't understand your comment #3: as I can't write a unique path to the
> XSL (as the XSL never changes because it is the layout of the report), 

Hint: XSLT parameters

> we use
> relative paths (as they worked with FOP 0.20.5 Java API flawlessly). These
> relative paths result in the very same image file name for the same report but
> other data. The only workaround I see here, is to make an additional XML
> transformation of the XSL to find such relative paths and replace them with
> temporary full paths, which is not very elegant.

I'd simply determine a unique ID for each report instance (something simple as a counter) and pass that in as an XSLT parameter. The stylesheet can then append that to the URI: file:myimage.png?id=12345, file:myimage.png?id=12346....

> I wonder, how I can get FOP working to process multiple documents in multiple
> threads.

That's what I was thinking, too. You're not going to be happy when you always write the same file. IMO that's a really bad idea. The trick with the different directories is not a bad idea if you actually have to write the image to a file in the first place.

> I guess, the only promising approach so far (FOP 0.95) is, to use new
> FopFactories and UserAgents for each thread and each report generated. But the
> note in http://xmlgraphics.apache.org/fop/0.94/embedding.html#multithreading
> ("Apache FOP may currently not be completely thread safe.") is not very
> encouraging.

That's just to cover our collective asses. FOP is thread-safe (if no little bug has sneaked in somewhere due to some oversight, multi-threading testing is not part of our normal test suite). But that doesn't mean it's not a good idea to do careful multi-threading testing of your application as a whole.

I'm afraid I can't give you the best idea with the information I currently have. I don't know how you create your image. I assume you generate it before you call FOP. Assuming you have some information that lets you identify the data that needs to be turned into an image, consider passing that information [1] into the XSLT stylesheet as a XSLT parameter (similar to what I suggested above) but this time, you use this information to build up a private URI that holds all information to uniquely identify the image belonging to that report. Then, write a URIResolver that can deal with this private URI scheme to generate the image on the fly. That might actually allow you to bypass writing the image to a file, thus making the whole thing faster.

[1] I'll try to show this by example: Assuming you can gather all your data from your database (assuming you use one) you could pass in the ID of the main record. my:report?id=873468&color=red could then be your unique URI for the report 873468 and some data shall be highlighted in red (random example feature). Your URIResolver will listen to the "my" scheme and parse it, then return an InputStream (or for example a DOM (DOMSource) in case of SVG) that accesses the finished image.

Maybe that helps.
Comment 13 M.H. 2008-10-21 05:35:24 UTC
Thanks for all these ideas! We first create all data (XSL, XML, SVG) in a temporary directory for each report. Then we call FOP to transform the XSL+XML to PDF. The references to the external SVG files are in the XSLs (as relative paths to these files).

With this approach, other developers first design the report "offline" on their workstation with their tools. We can also re-create the PDF anytime later with third party tools like fop.bat instead of our Java application. For debugging, we can look in any xsl, xml and svg as the files are there. Anyway, I wonder how you can generate the SVG on the fly and pass the SVGDOMSource to FOP, as the image cache is ignoring the URIResolver anyway (as I learned now).
Comment 14 Jeremias Maerki 2008-10-21 06:01:55 UTC
(In reply to comment #13)
> Thanks for all these ideas! We first create all data (XSL, XML, SVG) in a
> temporary directory for each report. Then we call FOP to transform the XSL+XML
> to PDF. The references to the external SVG files are in the XSLs (as relative
> paths to these files).
>
> With this approach, other developers first design the report "offline" on their
> workstation with their tools. We can also re-create the PDF anytime later with
> third party tools like fop.bat instead of our Java application. For debugging,
> we can look in any xsl, xml and svg as the files are there.

This sounds like we might have to rethink how we treat the URI when caching the image. If it's a relative URI, we'd have to prepend the base URI and only use the absolute URI in the image cache. I don't think we do that now. That could actually solve your problem.

> Anyway, I wonder
> how you can generate the SVG on the fly and pass the SVGDOMSource to FOP, as
> the image cache is ignoring the URIResolver anyway (as I learned now).

The thing is: We assume that a resource can be identified uniquely by its URI. After all there's the word "identifier" in "URI". If the same URI comes back, we assume it's the same image. The URIResolver is only used when we have to load the image (which is done once). The image loader framework then puts the loaded image (subclass of org.apache.xmlgraphics.image.loader.Image) into the cache under this URI, further identified by the ImageFlavor (as multiple representations of the same image can be stored in the image cache). So if you re-request the same URI again, the cache returns the image directly. No detour through the URIResolver.

So, to pass in an SVG DOM, your URIResolver will create a DOMSource instead of a StreamSource. PreloaderSVG can make use of a DOMSource, so it doesn't have to be serialized to a stream first, in case you build your SVG as a DOM somewhere.

Comment 15 M.H. 2008-10-21 09:36:39 UTC
(In reply to comment #14)
> The thing is: We assume that a resource can be identified uniquely by its URI.
> After all there's the word "identifier" in "URI". If the same URI comes back,
> we assume it's the same image. The URIResolver is only used when we have to
> load the image (which is done once). The image loader framework then puts the
> loaded image (subclass of org.apache.xmlgraphics.image.loader.Image) into the
> cache under this URI, further identified by the ImageFlavor (as multiple
> representations of the same image can be stored in the image cache). So if you
> re-request the same URI again, the cache returns the image directly. No detour
> through the URIResolver.

Okay, I think I fully understood this and this is basically okay. So, the URIResolver is no suitable way of changing these URIs, alas. So what is a URIResolver good for, if it's just ignored in some cases (here: image cache already has URI and doesn't call URIResolver anymore)? Even the first time my URIResolver is called, the image cache still has the original URI instead of the changed URI from the URIResolver. Or is this behaviour also as intended?



Comment 16 Jeremias Maerki 2008-10-21 14:44:43 UTC
(In reply to comment #15)
> 
> Okay, I think I fully understood this and this is basically okay. So, the
> URIResolver is no suitable way of changing these URIs, alas. So what is a
> URIResolver good for, if it's just ignored in some cases (here: image cache
> already has URI and doesn't call URIResolver anymore)? Even the first time my
> URIResolver is called, the image cache still has the original URI instead of
> the changed URI from the URIResolver. Or is this behaviour also as intended?

Yes, the URIResolver is ignored when the image is in the cache, but it's really only used for loading the actual image. The behaviour is as I intended it to be. I designed the cache so it uses the image's URI (i.e. an identifier) to uniquely identify the resource. Let me explain the design decision with some more information, to show you what kind of cases need to be handled:

We have to support different kinds of URIs on fo:external-graphic and for other resources:

http://images.company.com/logo.jpg   (this is a URL, and therefore a URI)
file:///C:/Images/logo.jpg   (this is a URL, and therefore a URI)

--> Direct access because they are URLs

urn:images:13487973   (this is a URN, and therefore a URI)

--> Identifier, the specifier doesn't care where the resource comes from



URI resolution means: Turn URI into a Resource.
The resource could be represented by a URL but doesn't have to be, because:

Possibilities:
- CatalogResolver: Map URIs to URLs.
  Example:
    urn:images:13487973 --> http://db-server/images?id=13487973 (provided by a servlet somewhere on a server)
    (Resolved system ID (URL) available)

- Private URI Resolver: 
  Example:
    urn:images:13487973 --> URIResolver directly returns a StreamSource with a InputStream for accessing the image
    (There might or might not be a system ID (URL) in this case)

If we implemented what you're wishing for, what would we do if there's no systemID? Can we be sure that the system ID is always more stable/correct than the original URI for use in the image cache? How do we decide what to use? IMO, if you use the same URI for different images, you violate the "identifier" purpose of the URI. The resource is no longer unique. Using the resolved system ID would only be a work-around. Well, this is my view and I can be wrong. But it's also how FOP has done it the last few years even before the image loader framework.

Notes on Image Loader Framework implementation:
For normal URLs (HTTP, FTP...) a stream is opened and decorated with an ImageIO ImageInputStream which provides random I/O access to the image. For that, the image is mirrored locally, either in memory or temporary file. Special care has to be taken that for the same URL, no two requests have to be initiated (for pre-loading and loading) which would cause additional round-trips.
File URLs are handled in a particular way. For those, direct random I/O access can be provided directly.

I guess I'll look into prepending the base URI to relative URIs tomorrow to make the original URI "more unique". That is almost certain to fix your problem.
Comment 17 M.H. 2008-10-22 09:56:52 UTC
(In reply to comment #16)
> I guess I'll look into prepending the base URI to relative URIs tomorrow to
> make the original URI "more unique". That is almost certain to fix your
> problem.

Yes, that would probably solve the issue. I also played around with some XSLT code to detect the current base URI (that has the unique subdirectory), but this would requires XSLT 2.0 and some nasty string cutting. With the standard Java Xalan (XSLT 1.0) I found no way to change my relative path to a unqiue full path during runtime.

So, I'm looking forward to this change! Thanks for taking time!
Comment 18 Jeremias Maerki 2008-10-23 04:48:08 UTC
Created attachment 22771 [details]
Proposed patch against FOP Trunk for URI pre-resolution

As promised I've looked into it (had some precious train time yesterday). I found two possible approaches to "pre-resolve" the URI relative to a base URI, so the image cache gets more absolute URIs. My first attempt was to build that into the image loading framework but that caused a lot of changes (even API changes). The second attempt is less invasive but needs changes in more than one place in FOP (ExternalGraphic and all renderers). To illustrate this I've just patched the PDFRenderer for the moment.

The patch uses java.net.URI (since Java 1.4) to do the URI resolution (using URI.resolve(URI), not JAXP-style URIResolver resolution!). That seems to do the job just fine. I'm not 100% sure this is ultimately the right approach which is why I'm just posting a proposed patch here rather than doing the change directly. The change itself should be pretty safe because if there's a problem parsing the URI, the original URI is simply returned. Only relative URIs should be affected.

The patch requires the Base URI (FOUserAgent.setBaseURL(String)) to be set for the document. From the command-line this will be done automatically (the source file's directory is used).

Feedback and further ideas welcome.
Comment 19 Jeremias Maerki 2008-10-23 04:56:12 UTC
BTW, just to explain what happens with this patch:

If you have src="chart.svg" on your external-graphic and the base URI is "file:/C:/reports/321cb123db23/", the image loader framework receives as URI: "file:/C:/reports/321cb123db23/chart.svg". Before the patch it would only receive "chart.svg".

Maybe it would actually be better to delay the "pre-resolution" as long as possible, i.e. to do it inside the image loader framework (my first approach). But it would still require a change for ExternalGraphic and all renderers because the currently applicable base URI is passed along.

If someone wanted to go even further, support for "xml:base" (http://www.w3.org/TR/xmlbase/) could be added to FOP to override the base URI for certain elements. Should be too hard to implement.
Comment 20 M.H. 2008-10-23 14:37:44 UTC
Wow, Jeremias! Thanks for working in this! I guess I have to find out how to get the latest developent version of FOP and how to compile it. I would like to see, if your patch fixed my specific problem ...
Comment 21 M.H. 2008-10-27 06:11:41 UTC
Just downloaded the current trunk, built it and get a

"I/O exception while reading font cache (org.apache.fop.fonts.EmbedFontInfo; local class incompatible: stream classdesc serialVersionUID = -9075848379822693399, local class serialVersionUID = 8755432068669997367). Discarding font cache file."

Error. As only fop.jar and xmlgraphics-commons.jar changed and double checked that my classpath contained these, I wonder what else is wrong.
Comment 22 Jeremias Maerki 2008-10-27 06:44:34 UTC
(In reply to comment #21)
> Just downloaded the current trunk, built it and get a
> 
> "I/O exception while reading font cache (org.apache.fop.fonts.EmbedFontInfo;
> local class incompatible: stream classdesc serialVersionUID =
> -9075848379822693399, local class serialVersionUID = 8755432068669997367).
> Discarding font cache file."
> 
> Error. As only fop.jar and xmlgraphics-commons.jar changed and double checked
> that my classpath contained these, I wonder what else is wrong.
> 

You can safely ignore that. It just means that the font cache file is being rebuilt.
Comment 23 Chris Bowditch 2008-10-27 06:55:04 UTC
This message is a warning that FOP failed to read from the Font cache. Which means any Font auto detection or Font directories will be re-scanned. So this failure doesn't break anything. To avoid the warning you can simply delete the old Font Cache file, which according to [1] lives in ${base}\conf\font.cache. Or you can disable Font Caching altogether using the option "use-cache"

I think I will create a FAQ for this as it comes up a lot.

[1] http://xmlgraphics.apache.org/fop/0.94/configuration.html#general-elements
Comment 24 M.H. 2008-11-21 07:50:46 UTC
So, I replaced fop.jar and xmlgraphics-commons.jar with the new trunk version but the problem persists: the image cache retrieves the same (first) SVG. Putting back the 

   FOP.clearImageCache();

it works again.

(By the way with the new JARs I get lots of new errors and warnings, e.g.:

WARNING T16: Ascender and descender together are larger than the em box. This could lead to a wrong baseline placement in Apache FOP.

SEVERE  T16: Unsupported TrueType font: Unicode cmap table not present. Aborting
WARNING T16: Unable to load font file: file:/C:/WINNT/FONTS/mapsym.ttf. Reason: java.io.IOException: TrueType font is not supported: file:/C:/WINNT/FONTS/mapsym.ttf
...)
Comment 25 Jeremias Maerki 2008-11-21 07:59:39 UTC
(In reply to comment #24)
> So, I replaced fop.jar and xmlgraphics-commons.jar with the new trunk version
> but the problem persists: the image cache retrieves the same (first) SVG.
> Putting back the 
> 
>    FOP.clearImageCache();
> 
> it works again.

I haven't applied the patch, yet. I was waiting for your feedback.

> (By the way with the new JARs I get lots of new errors and warnings, e.g.:
> 
> WARNING T16: Ascender and descender together are larger than the em box. This
> could lead to a wrong baseline placement in Apache FOP.

That particular one is gone since yesterday in FOP Trunk.

> SEVERE  T16: Unsupported TrueType font: Unicode cmap table not present.
> Aborting
> WARNING T16: Unable to load font file: file:/C:/WINNT/FONTS/mapsym.ttf. Reason:
> java.io.IOException: TrueType font is not supported:
> file:/C:/WINNT/FONTS/mapsym.ttf
> ...)
> 

Those are normal and expected on a Windows machine. You've got font auto-detection turned on, but FOP doesn't support all fonts it finds. Those error and warnings are there to inform you which fonts could not be made available inside FOP.
Comment 26 M.H. 2008-11-21 11:39:50 UTC
Oh, I see that there is an attachement. How can I aply this patch? When I look into the text file there are new (+) lines and removed (-) lines. I guess there is some kind of tool to simple run with this file? Or do I have to fiund the places in the code and replace them by hand?
Comment 27 Jeremias Maerki 2008-11-21 23:48:27 UTC
(In reply to comment #26)
> Oh, I see that there is an attachement. How can I aply this patch? When I look
> into the text file there are new (+) lines and removed (-) lines. I guess there
> is some kind of tool to simple run with this file? Or do I have to fiund the
> places in the code and replace them by hand?
> 

The attached patch is a unified diff which is the most popular format for patches. The easiest way to automatically apply it is Team/Apply Patch... inside Eclipse if you use that. Otherwise, TortoiseSVN on Windows will also make it easy. The most universal way is the "patch" utility: http://en.wikipedia.org/wiki/Patch_(Unix)
HTH
Comment 28 Alex Watson 2009-04-06 00:20:02 UTC
I am experiencing problems with the same image caching issue, and have a few suggestions for an alternate approach to resolving this issue. Unfortunately our base URI does not change so the patch included already for this defect does not help us here.

We have a FOP-based webserver, that generates PDF files with embedded images. We also use a URIResolver to intercept requests for images and map them onto different resources. Many of the images are static (logos etc), but some of these images change infrequently (perhaps once a week).

I discovered this problem when an image was missing on our webserver, but replacing the image did not fix the PDF (the absence of the image was cached as well). Similarly, deleting or changing an image did not alter the PDF.

I am concerned about performance and scalability, so I would rather not create a new FopFactory for each request. The workaround to call fopFactory.getImageManager().getCache().clearCache() is brute-force - this will flush images for all threads and reports (even if another thread is rendering a PDF at the time).

I think the ideal behaviour is to allow image caching to be optionally configurable on the session/run rather than globally on the FopFactory.

My suggestion is to allow the FOUserAgent to be (optionally) configured with its own ImageManager. The FOUserAgent constructor would default to the ImageManager from the FopFactory (to preserve existing behaviour by default).

The externalGraphic and PDFRenderer classes would change from
        FOUserAgent userAgent = getUserAgent();
        ImageManager manager = userAgent.getFactory().getImageManager();
to
        ImageManager manager = userAgent.getImageManager();

To use session based image caching would simply require a new method to be invoked on the FOUserAgent to create its own ImageManager eg. FOUserAgent enableSessionCaching() {this.imageManager = new ImageManager(factory); }.

An additional benefit is that session based images would be cleaned up much sooner (helping with our monitoring of free memory within the app).

Another useful enhancement would allow the ImageCache to be configurable to exclude some URI patterns from its cache (perhaps by extending ImageCacheListener). This could enable a session based cache to cache some images (based on uri) and then fall through to the global cache for other images.
Comment 29 Jeremias Maerki 2009-04-07 13:58:24 UTC
(In reply to comment #28)
Hi Alex,
attaching the image cache to the FOUserAgent doesn't make much sense IMO as you cannot profit from cached images over multiple document runs. Usually, the renderer itself already caches the image once per document. I don't think there's a problem with cleanup, as you mention. We're using soft references: if not enough memory is there, the images get automatically discarded. This is well tested. Your suggestion about patterns to exclude certain URIs from caching is an idea that could be investigated but I'm not sure it helps here. I think it's better that we try this pre-resolution approach. I just have to find some time and motivation to finish that.

Incidentally, the problem of the missing image that was replaced but then not picked up by FOP has been solved a few days ago by:
http://svn.apache.org/viewvc?rev=759144&view=rev
http://svn.apache.org/viewvc?rev=759150&view=rev

Please check out XML Graphics Commons Trunk to see if your situation improves.
http://svn.apache.org/repos/asf/xmlgraphics/commons/trunk
Comment 30 Alex Watson 2009-04-09 02:08:31 UTC
Hi Jeremias,
Thanks for the feedback. We currently render most of our PDF's twice - once while discarding output to calculate the page count and then again with the page count embedded within the document - I had expected the image cache to get more of a workout within a single session/run especially if logos are rendered repeatedly on each page.

However, I was not aware that the renderers were caching the images themselves - can you point me to where in the code base that is happening?

My thinking is that the current image caching strategy works well for a mostly static set of images - but is less flexible when the images are more dynamic in nature. I don't expect FOP to handle all of the various caching optimisations that different people might want, but it might be a very small code change to let people take care of it themselves.

At the moment there is no way to alter the image cache behaviour (the objects are private and there are no setters to substitute them) - assuming that I do not want to modify the FOP code base in my system.

I agree that only using the ImageManager as a cache for a single run would be far less beneficial for performance, but it would allow people to implement their own caching strategy via the UriResolver hooks.

Even a simple code modification to disable (nullify) the cache on the ImageManager, would allow people to implement their own image caching via the UriResolver hooks.

Thanks again.

btw - would you prefer me to raise this as a separate issue?
Comment 31 Jeremias Maerki 2009-04-09 12:05:37 UTC
Hi Alex

(In reply to comment #30)
> Hi Jeremias,
> Thanks for the feedback. We currently render most of our PDF's twice - once
> while discarding output to calculate the page count and then again with the

Just to speed things up, you may want to render to the area tree XML (or to the new intermediate format with FOP Trunk) instead of to PDF if all you need is the page count. That should be faster and will not need to load all images on the first pass.

> page count embedded within the document - I had expected the image cache to get
> more of a workout within a single session/run especially if logos are rendered
> repeatedly on each page.
> 
> However, I was not aware that the renderers were caching the images themselves
> - can you point me to where in the code base that is happening?

In http://svn.eu.apache.org/viewvc/xmlgraphics/fop/trunk/src/java/org/apache/fop/render/pdf/PDFRenderer.java?view=markup

in putImage():
        PDFXObject xobject = pdfDoc.getXObject(uri);
        if (xobject != null) {
            float w = (float) pos.getWidth() / 1000f;
            float h = (float) pos.getHeight() / 1000f;
            placeImage((float)pos.getX() / 1000f,
                       (float)pos.getY() / 1000f, w, h, xobject);
            return;
        }

It's not really "caching", but an image only needs to be embedded once in a PDF and can be reused multiple times. Not all output formats can do that, though.

> My thinking is that the current image caching strategy works well for a mostly
> static set of images - but is less flexible when the images are more dynamic in
> nature. I don't expect FOP to handle all of the various caching optimisations
> that different people might want, but it might be a very small code change to
> let people take care of it themselves.
>
> At the moment there is no way to alter the image cache behaviour (the objects
> are private and there are no setters to substitute them) - assuming that I do
> not want to modify the FOP code base in my system.
> 
> I agree that only using the ImageManager as a cache for a single run would be
> far less beneficial for performance, but it would allow people to implement
> their own caching strategy via the UriResolver hooks.
> 
> Even a simple code modification to disable (nullify) the cache on the
> ImageManager, would allow people to implement their own image caching via the
> UriResolver hooks.

Ok, I see what you mean. I guess there are various level where this can happen. One would be on the level your suggest. HTTP, for example, allows to check file stamps. That could be used to trigger a reload of an image. But that is going to be difficult to implement with the URIResolver approach. In a high-volume system this might also result in too many round-trips for file stamp checking. How about just doing the same I've done with the expiration for the invalid URIs? We specify an expiration for the cached images. When the lease expires, the image is discarded and reloaded. That has very little management overhead,  should address your use case and should be implementable with only a few lines of code now that we've already got the expiration code. Would that work for you?

> Thanks again.
> 
> btw - would you prefer me to raise this as a separate issue?

No, I think it's fine to gather all informantion here, although it's maybe not the exact same case.

BTW, I'd still be grateful for feedback on the URI "pre-resolution" idea in my patch.
Comment 32 Alex Watson 2009-04-13 22:03:00 UTC
Hi Jeremias,
Thanks for the pointers to the code base - that is a real help to my understanding.

I had considered the expiration idea for all images (rather than just for missing images), but was not sure if it was the ideal solution. This solution would be perfect for me (with my current problem), but it would not have helped M.H. who originally raised this issue. It would depend upon how configurable the expiration was and how expensive it was to re-fetch an image.

Can you explain your comments about the UriResolver being expensive in high-volume applications? I didn't quite understand the part about HTTP and timestamps.

I know that the built-in (default) UriResolver will create connections to HTTP webservers or local FileSystems (etc) - and this can become expensive without any caching strategies.

However, when a developer plugs-in their own UriResolver it can be as smart and efficient as they like (and does not need to create external connections). We have a global ResourceResolver class that implements the UriResolver interface. This implements its own caching strategy (and caches fonts, nested XSLT, imported XML as well as images). Our implementation primarily loads resources from disk (file), but future extensions to our system could allow this to generate XML, Images or even XSLT on the fly.

I guess that is why I would prefer a hook that will let me take care of (part of) the caching solutuion.

Sorry I cannot help with the patch - we only specify "logical" resources within our XSLT, they are all mapped to real resources via our UriResolver. We do not use the Base parameter. For what it is worth, I think the patch looks OK to me and may help some users - but it does not really address my concerns.

Cheers!

Alex
Comment 33 M.H. 2010-03-01 14:48:22 UTC
Just to update my problem: we still serialize FOP processing and clear the image cache after each single PDF generation. (I didn't have time to re-build a new FOP from the CSV tree as my first try with the latest FOP trunk produced lots of errors during building FOP. So I wait for a next official release.)
Comment 34 M.H. 2010-08-30 05:28:09 UTC
So, FOP 1.0 is out. I would like to test if the new URI handling in FOP 1.0 solves this issue. However, due to the randomness of this bug, I wonder how I can really test this and not just say "hey, it doesn't occur anymore" just to get killed by my boss, when it still happens in production environment with high concurrent FOP processings.

The last reply #32 from Alex Whatson "Sorry I cannot help with the patch" is not very comforting. So is there an explicit change in URI handling so that relative paths in the XSL-FO are expanded to the full URL/file name in the cache?
Comment 35 Glenn Adams 2012-03-30 20:36:39 UTC
move to normal priority, pending further action
Comment 36 M.H. 2012-03-30 22:08:18 UTC
As there is still no new release, I can't go into an evaluation phase. So, I'm waiting for a release to use to get some time to built a reproducable test case. Is there any planned dead line for a new FOP release?
Comment 37 Glenn Adams 2012-03-30 22:23:28 UTC
(In reply to comment #36)
> As there is still no new release, I can't go into an evaluation phase. So, I'm
> waiting for a release to use to get some time to built a reproducable test
> case. Is there any planned dead line for a new FOP release?

can you use a nightly build [1] to test? see also [2]

[1] http://ci.apache.org/projects/xmlgraphics/fop/snapshots/
[2] http://ci.apache.org/builders/fop-trunk/
Comment 38 Glenn Adams 2012-04-07 01:41:34 UTC
resetting P2 open bugs to P3 pending further review