Bug 46336 - Synchronization fault in FontInfoFinder
Summary: Synchronization fault in FontInfoFinder
Status: NEW
Alias: None
Product: Fop - Now in Jira
Classification: Unclassified
Component: fonts (show other bugs)
Version: trunk
Hardware: All All
: P3 normal
Target Milestone: ---
Assignee: fop-dev
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-12-04 01:12 UTC by ilj
Modified: 2012-04-11 06:17 UTC (History)
0 users



Attachments
patch proposal (7.01 KB, patch)
2008-12-22 16:03 UTC, Andreas L. Delmelle
Details | Diff
updated patch (7.63 KB, patch)
2008-12-22 16:49 UTC, Andreas L. Delmelle
Details | Diff
correction (8.24 KB, patch)
2008-12-22 16:59 UTC, Andreas L. Delmelle
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description ilj 2008-12-04 01:12:15 UTC
I have a multi-threaded rendering application. It produced this exception:

java.lang.NullPointerException
at org.apache.fop.fonts.autodetect.FontInfoFinder.find(FontInfoFinder.java:172)
at org.apache.fop.render.PrintRendererConfigurator.addFontInfoListFromFileList(PrintRendererConfigurator.java:233)
at org.apache.fop.render.PrintRendererConfigurator.buildFontListFromConfiguration(PrintRendererConfigurator.java:140)
at org.apache.fop.render.PrintRendererConfigurator.configure(PrintRendererConfigurator.java:95)
at org.apache.fop.render.pdf.PDFRendererConfigurator.configure(PDFRendererConfigurator.java:71)
at org.apache.fop.render.RendererFactory.createRenderer(RendererFactory.java:187)
at org.apache.fop.area.RenderPagesModel.<init>(RenderPagesModel.java:68)
at org.apache.fop.area.AreaTreeHandler.setupModel(AreaTreeHandler.java:127)
at org.apache.fop.area.AreaTreeHandler.<init>(AreaTreeHandler.java:102)
at org.apache.fop.render.RendererFactory.createFOEventHandler(RendererFactory.java:224)
at org.apache.fop.fo.FOTreeBuilder.<init>(FOTreeBuilder.java:100)
at org.apache.fop.apps.Fop.createDefaultHandler(Fop.java:100)
at org.apache.fop.apps.Fop.<init>(Fop.java:78)
at org.apache.fop.apps.FopFactory.newFop(FopFactory.java:247)
...


I guess the problem is, that you don't synchronize usage of FontCache in this method:

150     public EmbedFontInfo find(URL fontUrl, FontResolver resolver, FontCache fontCache) {
151         String embedUrl = null;
152         embedUrl = fontUrl.toExternalForm();
153         
154         long fileLastModified = -1;
155         if (fontCache != null) {
156             try {
157                 URLConnection conn = fontUrl.openConnection();
158                 try {
159                     fileLastModified = conn.getLastModified();
160                 } finally {
161                     //An InputStream is created even if it's not accessed, but we need to close it.
162                     IOUtils.closeQuietly(conn.getInputStream());
163                 }
164             } catch (IOException e) {
165                 // Should never happen, because URL must be local
166                 log.debug("IOError: " + e.getMessage());
167                 fileLastModified = 0;
168             }
169             // firstly try and fetch it from cache before loading/parsing the font file
170             if (fontCache.containsFont(embedUrl)) {
171                 CachedFontInfo fontInfo = fontCache.getFont(embedUrl);
172                 if (fontInfo.lastModified() == fileLastModified) {
173                     return fontInfo;
174                 } else {
175                     // out of date cache item
176                     fontCache.removeFont(embedUrl);
177                 }
178             // is this a previously failed parsed font?
179             } else if (fontCache.isFailedFont(embedUrl, fileLastModified)) {
180                 if (log.isDebugEnabled()) {
181                     log.debug("Skipping font file that failed to load previously: " + embedUrl);
182                 }
183                 return null;
184             }
185         }
186         
187         // try to determine triplet information from font file
188         CustomFont customFont = null;
189         try {
190             customFont = FontLoader.loadFont(fontUrl, resolver);
191         } catch (Exception e) {
192             //TODO Too verbose (it's an error but we don't care if some fonts can't be loaded)
193             if (log.isErrorEnabled()) {
194                 log.error("Unable to load font file: " + embedUrl + ". Reason: " + e.getMessage());
195             }
196             if (fontCache != null) {
197                 fontCache.registerFailedFont(embedUrl, fileLastModified);
198             }
199             return null;
200         }
201         return fontInfoFromCustomFont(fontUrl, customFont, fontCache);     
202     }
203 }
204 

When thread1 and thread2 both enter IF at line 170, one can remove the font and then the other will get null at line 171.
Comment 1 Andreas L. Delmelle 2008-12-04 10:51:38 UTC
Are you certain it is a synchronization failure, or did you only draw that conclusion because the app is multi-threaded and the issue does not arise in single-threaded context?

If I interpret correctly, this points to something other than a synchronization failure...

java.lang.NullPointerException
at org.apache.fop.fonts.autodetect.FontInfoFinder.find(FontInfoFinder.java:172)
...
170             if (fontCache.containsFont(embedUrl)) {
171                 CachedFontInfo fontInfo = fontCache.getFont(embedUrl);
172                 if (fontInfo.lastModified() == fileLastModified) {


The NullPointerException must be caused by fontInfo.lastModified(). 
It therefore appears possible for the FontCache to return true for containsFont(...) and yet return null for getFont(...) with the same URL?

Anyway, I quickly checked the source in FOP Trunk, and getFont() doesn't even exist as a method for FontCache anymore. Can you check if the problem persists if you use trunk instead of 0.95?
Comment 2 ilj 2008-12-04 23:46:27 UTC
unfortunately, i can't check the development trunk. we are already in production and thus we would like to update to somewhat stable releases.

it's obvious to me, that this issue is most probably caused by lack of synchronization, since calling containsFont and getFont is not atomic. and we already had just the same kind of issue within FOP - see bug 46211.

anyway - i can make a workaround for this in my own logic for time being and wait for new stable FOP release.
Comment 3 Andreas L. Delmelle 2008-12-05 10:05:13 UTC
(In reply to comment #2)
> unfortunately, i can't check the development trunk. we are already in
> production and thus we would like to update to somewhat stable releases.
> 
> it's obvious to me, that this issue is most probably caused by lack of
> synchronization, since calling containsFont and getFont is not atomic. and we
> already had just the same kind of issue within FOP - see bug 46211.

Yes, now I see it. If I judge correctly, the problem is still present in FOP Trunk. 
One thread can call find() (non-synchronized), which calls getFontInfos() (non-synchronized), which could call removeFont(). Only the removal is done in a synchronized block... but another thread could indeed at the same time be querying the cache, which explains this issue.

In org.apache.fop.fo.properties.PropertyCache (following the example of Java's ConcurrentHashMap), this is solved by performing a get() twice if necessary. Once without synchronization, but if that attempt fails, we still try a second time in a synchronized block. Although here, it seems like the other way around: a check for contains() succeeded, but the font was removed after that call... Not sure what the best solution would be in this case...

Confirms what I already knew to be true: programming with taking multi-threading contexts into account is definitely an Art-form... Even when /everything/ goes right during testing, that does not exclude race-conditions from popping up months after a release.

Thanks for the report and the feedback!
Comment 4 Andreas L. Delmelle 2008-12-08 10:19:35 UTC
Investigated this issue closer, and one source of errors seems to be in the silent assumption, in FontInfoFinder.find(), that the fileLastModified variable, set right before containsFont() is called, will always be the same as the cached font file's lastModified() it will be compared to if the call returns true and getFontInfos() is triggered.
Not even inlining the call to FontCache.getLastModified() will guarantee that, although it would make occurrences far less likely.
It is conceivable that another thread, in the time between those two calls, has added a newer version of the file to the cache, which would result in FontCache.getFontInfos() removing that entry and return null... The file corresponding to the newly added entry is then reparsed in the original thread and added again to the map (operation which is thus performed twice, without good reason)

Besides that, as it was pointed out earlier, only the writes to the Map backing FontCache are performed in a synchronized fashion. The reads are not, and so may return results that turn out to be unreliable (i.e. they are guaranteed to be correct only for the duration of the call).

Trying to get my head around what actually happens there, seems to show at least /some/ redundancies...
FontInfoFinder.find()
-> fontCache.containsFont()
  -> fontFileMap.containsKey()
-> fontCache.getFontInfos()
  -> fontCache.getFontFile()
    -> fontCache.containsFont()
      -> fontFileMap.containsKey()
    -> fontCache.removeFont()
      -> fontCache.containsFont()
        -> fontFileMap.containsKey()
      -> fontFileMap.remove()
        -> fontFileMap.containsKey() (?)

On the other hand, getFontInfos() itself assumes that getFontFile() will always return a non-null result, while the implementation a few lines up shows otherwise.

IMO, to make access by multiple threads easily manageable, FontCache has too many public methods. In fact, almost none is used anywhere outside the class.  removeFont() is one obvious candidate for a change in visibility. As for the interplay between FontInfoFinder and FontCache, it seems that either the one or the other is assuming too much responsibilities. Can't put my finger yet on which of the two is stepping out of bounds (initial guess would be the latter), but I'm definitely going to check if I can clean that up a little.
Comment 5 Andreas L. Delmelle 2008-12-10 12:23:51 UTC
I think I found a partial solution to the issue. "Partial" means "safer, but still not ideal"

Explanation:
I rewrote FontCache.getFontInfos() to something like:

---
CachedFontFile cff = getFontFile(embedUrl);
if (cff != null) {
  if (cff.lastModified >= lastModified) {
    return cff.getEmbedFontInfos();
  }
  synchronized (changeLock) {
    //redo the fetch to make sure
    cff = getFontFile(embedUrl);
    if (cff != null) {
      if (cff.lastModified < lastModified) {
        //outdated entry, and we're the first one here, so...
        getFontFileMap().remove(embedUrl);
        changed = true;
        return null;
      } else {
        //another thread already detected this, and reloaded
        return cff.getEmbedFontInfos();
      }
    }
  }
}

synchronized (changeLock) {
  cff = getFontFile(embedUrl);
  if (cff != null) {
    //first fetch returned null, but now we do get one...
    //for simplicity, assume this entry will do
    return cff.getEmbedFontInfos();
  }
}

return null
---

In the best case (font present & up-to-date), there's no synchronization overhead. In the worst case, we would perform three fetches: one non-synchronized (returning an outdated entry), a second synchronized (returning null because another thread already removed the font), and a third one returning... what exactly?

Now, it is still not ideal since there is absolutely no guarantee yet that the thread that actually removed the entry will be done with reloading the font (in FontInfoFinder.find()) in time for subsequent threads to obtain the new entry... In theory it is still possible that multiple threads reload the font in parallel, which should actually be avoided, IIC.
It does remain safe, since, although multiple threads reload the font, only one of them will actually add the newer entry to the cache. (see FontCache.addFont())
Comment 6 Andreas L. Delmelle 2008-12-22 16:03:38 UTC
Created attachment 23047 [details]
patch proposal


Another attempt, slightly better IIC, but still not bullet-proof.

If getFontInfos() returns an outdated entry, and another thread has already started the removal/readdition, then the thread will wait until it is notified by either addFont() or registerFailedFont().

Tricky bits:
- after the notification, we're still not sure that it stems from the same font we're waiting for... (any addFont() or registerFailedFont() will trigger notifyAll())
- if the removal/readdition has begun, and one thread is busy reloading the font, getFontInfos() will still return null (parallel loading still remains possible)

Remaining question (not immediately clear to me):
Is it conceivable that one thread removes the outdated entry, yet neither addFont() nor registerFailedFont() is called later? It didn't seem so, but should that be the case, the wait() method should probably be passed a reasonable maximum duration to avoid eternal sleep.
Comment 7 Andreas L. Delmelle 2008-12-22 16:49:31 UTC
Created attachment 23048 [details]
updated patch


Slight improvement over the previous patch:
Added a Set of 'changingFonts', to keep track of which fonts are in the process of being reloaded. 
Upon encountering an outdated entry, its embedUrl is added to the Set. Later on, either addFont() or registerFailedFont() will remove it again. Other threads will wait while the font is in transition, and can check the changingFonts set to see if the font corresponding to the embedUrl was already reloaded.
Comment 8 Andreas L. Delmelle 2008-12-22 16:59:40 UTC
Created attachment 23049 [details]
correction


previous one still buggy on the transient set...
Comment 9 Glenn Adams 2012-04-07 01:42:30 UTC
resetting P2 open bugs to P3 pending further review
Comment 10 Glenn Adams 2012-04-11 06:17:24 UTC
change status from ASSIGNED to NEW for consistency