I have a multi-threaded rendering application. It produced this exception: java.lang.NullPointerException at org.apache.fop.fonts.autodetect.FontInfoFinder.find(FontInfoFinder.java:172) at org.apache.fop.render.PrintRendererConfigurator.addFontInfoListFromFileList(PrintRendererConfigurator.java:233) at org.apache.fop.render.PrintRendererConfigurator.buildFontListFromConfiguration(PrintRendererConfigurator.java:140) at org.apache.fop.render.PrintRendererConfigurator.configure(PrintRendererConfigurator.java:95) at org.apache.fop.render.pdf.PDFRendererConfigurator.configure(PDFRendererConfigurator.java:71) at org.apache.fop.render.RendererFactory.createRenderer(RendererFactory.java:187) at org.apache.fop.area.RenderPagesModel.<init>(RenderPagesModel.java:68) at org.apache.fop.area.AreaTreeHandler.setupModel(AreaTreeHandler.java:127) at org.apache.fop.area.AreaTreeHandler.<init>(AreaTreeHandler.java:102) at org.apache.fop.render.RendererFactory.createFOEventHandler(RendererFactory.java:224) at org.apache.fop.fo.FOTreeBuilder.<init>(FOTreeBuilder.java:100) at org.apache.fop.apps.Fop.createDefaultHandler(Fop.java:100) at org.apache.fop.apps.Fop.<init>(Fop.java:78) at org.apache.fop.apps.FopFactory.newFop(FopFactory.java:247) ... I guess the problem is, that you don't synchronize usage of FontCache in this method: 150 public EmbedFontInfo find(URL fontUrl, FontResolver resolver, FontCache fontCache) { 151 String embedUrl = null; 152 embedUrl = fontUrl.toExternalForm(); 153 154 long fileLastModified = -1; 155 if (fontCache != null) { 156 try { 157 URLConnection conn = fontUrl.openConnection(); 158 try { 159 fileLastModified = conn.getLastModified(); 160 } finally { 161 //An InputStream is created even if it's not accessed, but we need to close it. 162 IOUtils.closeQuietly(conn.getInputStream()); 163 } 164 } catch (IOException e) { 165 // Should never happen, because URL must be local 166 log.debug("IOError: " + e.getMessage()); 167 fileLastModified = 0; 168 } 169 // firstly try and fetch it from cache before loading/parsing the font file 170 if (fontCache.containsFont(embedUrl)) { 171 CachedFontInfo fontInfo = fontCache.getFont(embedUrl); 172 if (fontInfo.lastModified() == fileLastModified) { 173 return fontInfo; 174 } else { 175 // out of date cache item 176 fontCache.removeFont(embedUrl); 177 } 178 // is this a previously failed parsed font? 179 } else if (fontCache.isFailedFont(embedUrl, fileLastModified)) { 180 if (log.isDebugEnabled()) { 181 log.debug("Skipping font file that failed to load previously: " + embedUrl); 182 } 183 return null; 184 } 185 } 186 187 // try to determine triplet information from font file 188 CustomFont customFont = null; 189 try { 190 customFont = FontLoader.loadFont(fontUrl, resolver); 191 } catch (Exception e) { 192 //TODO Too verbose (it's an error but we don't care if some fonts can't be loaded) 193 if (log.isErrorEnabled()) { 194 log.error("Unable to load font file: " + embedUrl + ". Reason: " + e.getMessage()); 195 } 196 if (fontCache != null) { 197 fontCache.registerFailedFont(embedUrl, fileLastModified); 198 } 199 return null; 200 } 201 return fontInfoFromCustomFont(fontUrl, customFont, fontCache); 202 } 203 } 204 When thread1 and thread2 both enter IF at line 170, one can remove the font and then the other will get null at line 171.
Are you certain it is a synchronization failure, or did you only draw that conclusion because the app is multi-threaded and the issue does not arise in single-threaded context? If I interpret correctly, this points to something other than a synchronization failure... java.lang.NullPointerException at org.apache.fop.fonts.autodetect.FontInfoFinder.find(FontInfoFinder.java:172) ... 170 if (fontCache.containsFont(embedUrl)) { 171 CachedFontInfo fontInfo = fontCache.getFont(embedUrl); 172 if (fontInfo.lastModified() == fileLastModified) { The NullPointerException must be caused by fontInfo.lastModified(). It therefore appears possible for the FontCache to return true for containsFont(...) and yet return null for getFont(...) with the same URL? Anyway, I quickly checked the source in FOP Trunk, and getFont() doesn't even exist as a method for FontCache anymore. Can you check if the problem persists if you use trunk instead of 0.95?
unfortunately, i can't check the development trunk. we are already in production and thus we would like to update to somewhat stable releases. it's obvious to me, that this issue is most probably caused by lack of synchronization, since calling containsFont and getFont is not atomic. and we already had just the same kind of issue within FOP - see bug 46211. anyway - i can make a workaround for this in my own logic for time being and wait for new stable FOP release.
(In reply to comment #2) > unfortunately, i can't check the development trunk. we are already in > production and thus we would like to update to somewhat stable releases. > > it's obvious to me, that this issue is most probably caused by lack of > synchronization, since calling containsFont and getFont is not atomic. and we > already had just the same kind of issue within FOP - see bug 46211. Yes, now I see it. If I judge correctly, the problem is still present in FOP Trunk. One thread can call find() (non-synchronized), which calls getFontInfos() (non-synchronized), which could call removeFont(). Only the removal is done in a synchronized block... but another thread could indeed at the same time be querying the cache, which explains this issue. In org.apache.fop.fo.properties.PropertyCache (following the example of Java's ConcurrentHashMap), this is solved by performing a get() twice if necessary. Once without synchronization, but if that attempt fails, we still try a second time in a synchronized block. Although here, it seems like the other way around: a check for contains() succeeded, but the font was removed after that call... Not sure what the best solution would be in this case... Confirms what I already knew to be true: programming with taking multi-threading contexts into account is definitely an Art-form... Even when /everything/ goes right during testing, that does not exclude race-conditions from popping up months after a release. Thanks for the report and the feedback!
Investigated this issue closer, and one source of errors seems to be in the silent assumption, in FontInfoFinder.find(), that the fileLastModified variable, set right before containsFont() is called, will always be the same as the cached font file's lastModified() it will be compared to if the call returns true and getFontInfos() is triggered. Not even inlining the call to FontCache.getLastModified() will guarantee that, although it would make occurrences far less likely. It is conceivable that another thread, in the time between those two calls, has added a newer version of the file to the cache, which would result in FontCache.getFontInfos() removing that entry and return null... The file corresponding to the newly added entry is then reparsed in the original thread and added again to the map (operation which is thus performed twice, without good reason) Besides that, as it was pointed out earlier, only the writes to the Map backing FontCache are performed in a synchronized fashion. The reads are not, and so may return results that turn out to be unreliable (i.e. they are guaranteed to be correct only for the duration of the call). Trying to get my head around what actually happens there, seems to show at least /some/ redundancies... FontInfoFinder.find() -> fontCache.containsFont() -> fontFileMap.containsKey() -> fontCache.getFontInfos() -> fontCache.getFontFile() -> fontCache.containsFont() -> fontFileMap.containsKey() -> fontCache.removeFont() -> fontCache.containsFont() -> fontFileMap.containsKey() -> fontFileMap.remove() -> fontFileMap.containsKey() (?) On the other hand, getFontInfos() itself assumes that getFontFile() will always return a non-null result, while the implementation a few lines up shows otherwise. IMO, to make access by multiple threads easily manageable, FontCache has too many public methods. In fact, almost none is used anywhere outside the class. removeFont() is one obvious candidate for a change in visibility. As for the interplay between FontInfoFinder and FontCache, it seems that either the one or the other is assuming too much responsibilities. Can't put my finger yet on which of the two is stepping out of bounds (initial guess would be the latter), but I'm definitely going to check if I can clean that up a little.
I think I found a partial solution to the issue. "Partial" means "safer, but still not ideal" Explanation: I rewrote FontCache.getFontInfos() to something like: --- CachedFontFile cff = getFontFile(embedUrl); if (cff != null) { if (cff.lastModified >= lastModified) { return cff.getEmbedFontInfos(); } synchronized (changeLock) { //redo the fetch to make sure cff = getFontFile(embedUrl); if (cff != null) { if (cff.lastModified < lastModified) { //outdated entry, and we're the first one here, so... getFontFileMap().remove(embedUrl); changed = true; return null; } else { //another thread already detected this, and reloaded return cff.getEmbedFontInfos(); } } } } synchronized (changeLock) { cff = getFontFile(embedUrl); if (cff != null) { //first fetch returned null, but now we do get one... //for simplicity, assume this entry will do return cff.getEmbedFontInfos(); } } return null --- In the best case (font present & up-to-date), there's no synchronization overhead. In the worst case, we would perform three fetches: one non-synchronized (returning an outdated entry), a second synchronized (returning null because another thread already removed the font), and a third one returning... what exactly? Now, it is still not ideal since there is absolutely no guarantee yet that the thread that actually removed the entry will be done with reloading the font (in FontInfoFinder.find()) in time for subsequent threads to obtain the new entry... In theory it is still possible that multiple threads reload the font in parallel, which should actually be avoided, IIC. It does remain safe, since, although multiple threads reload the font, only one of them will actually add the newer entry to the cache. (see FontCache.addFont())
Created attachment 23047 [details] patch proposal Another attempt, slightly better IIC, but still not bullet-proof. If getFontInfos() returns an outdated entry, and another thread has already started the removal/readdition, then the thread will wait until it is notified by either addFont() or registerFailedFont(). Tricky bits: - after the notification, we're still not sure that it stems from the same font we're waiting for... (any addFont() or registerFailedFont() will trigger notifyAll()) - if the removal/readdition has begun, and one thread is busy reloading the font, getFontInfos() will still return null (parallel loading still remains possible) Remaining question (not immediately clear to me): Is it conceivable that one thread removes the outdated entry, yet neither addFont() nor registerFailedFont() is called later? It didn't seem so, but should that be the case, the wait() method should probably be passed a reasonable maximum duration to avoid eternal sleep.
Created attachment 23048 [details] updated patch Slight improvement over the previous patch: Added a Set of 'changingFonts', to keep track of which fonts are in the process of being reloaded. Upon encountering an outdated entry, its embedUrl is added to the Set. Later on, either addFont() or registerFailedFont() will remove it again. Other threads will wait while the font is in transition, and can check the changingFonts set to see if the font corresponding to the embedUrl was already reloaded.
Created attachment 23049 [details] correction previous one still buggy on the transient set...
resetting P2 open bugs to P3 pending further review
change status from ASSIGNED to NEW for consistency