Bug 43728 - Log file loss when specified file is locked by another JVM
Summary: Log file loss when specified file is locked by another JVM
Status: NEEDINFO
Alias: None
Product: Log4j - Now in Jira
Classification: Unclassified
Component: Appender (show other bugs)
Version: 1.2
Hardware: All Windows XP
: P2 critical
Target Milestone: ---
Assignee: log4j-dev
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-10-29 12:39 UTC by Michael Geiser
Modified: 2008-08-02 09:57 UTC (History)
1 user (show)



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Geiser 2007-10-29 12:39:18 UTC
 
Comment 1 Michael Geiser 2007-10-29 12:49:13 UTC
Using a clustered WebLogic server with two managed server, we misconfigured 
the DailyRollingFileAppender so that both managed servers to use the same log 
file.  

Each managed server runs in it's own JVM and the two JVMS cannot have an 
exclusive lock on the same file (duh!)

I noticed that the log files for previous intervals frequently contained data 
from the wrong time interval, some data was entirely missing and that you have 
to look in the current log as well as the log for the previous interval to 
ensure you get all the logged information.

It appeared one of the JVMs wrote to the previous interval log and over-wrote 
the logs from that interval.

To conclusively demonstrate this was the case, I left both Log4j config files 
specify the same log file but added the sting "MS1" and "MS2" to the 
ConversionPattern parameter for each log file so that it was apparent which 
managed server was writing to which log file.

I saw when both Managed Servers try to log to the same log, one managed server 
writes to the specified file (the Managed Server that does get the lock 
varies) and the other managed server writes to the previous interval’s log 
file.  Since the appenders also do not specify to append to (but instead 
overwrite) files, the previous interval’s log file is overwritten with new 
data.  This also confirmed that a managed server always wrote to the same log 
file instead of each managed server opportunistically writing to any unlocked 
log file. 

We can see demonstrated this below. The first few bytes of the first lines of 
the files are associated with the log file name.    

Prior to 10:00am, only one managed server (MS1) had any activity to log and it 
logged to the docmgt.log file.  
 
At 10:00am, I ran a test were a client ran 10 treads calling an EJB via RMI.  
The 10 calls were balanced by WebLogic across the two managed servers in the 
domain.  When this test ran and both Managed Servers were being used, the log 
data from MS1 that was in the docmgt.log file was now located in 
docmgt.log.2007-10-28-PM (the docmgt.log file at 9:58 was renamed) and the 
data from MS2 was logged to the docmgt.log file.  The data that was in 
docmgt.log.2007-10-28-PM prior to 10:00am was over-written and lost. 

at 9:58 am 
docmgt.log                |MS1|20071029||08:56:05,850||
docmgt.log.2007-10-28-PM  |MS2|20071028||21:20:08,318||
docmgt.log.2007-10-27-PM  |MS1|20071028||21:19:41,689||
docmgt.log.2007-10-27-AM  |MS2|20071027||19:55:26,386||
docmgt.log.2007-10-26-PM  |MS1|20071026||12:59:47,760||  
docmgt.log.2007-10-26-AM  |MS1|20071027||00:46:54,637||
docmgt.log.2007-10-25-PM  |MS2|20071026||09:15:08,077||

at 10:01
docmgt.log                |MS2|20071029||10:00:22,993||                
docmgt.log.2007-10-28-PM  |MS1|20071029||08:56:05,850||
docmgt.log.2007-10-27-PM  |MS1|20071028||21:19:41,689||
docmgt.log.2007-10-27-AM  |MS2|20071027||19:55:26,386||
docmgt.log.2007-10-26-PM  |MS1|20071026||12:59:47,760||  
docmgt.log.2007-10-26-AM  |MS1|20071027||00:46:54,637||
docmgt.log.2007-10-25-PM  |MS2|20071026||09:15:08,077||

Although we could specify the append parameter be true, overwritting the file 
in this case is clearly a bug.  It is important to perform the log, but there 
is not alert that there is a file contention and it would be infinately 
preferable that the previous interval log NOT be overwritten.  add a short 
random sting or time stamp to the log file name specified in the appender if 
nothing else.





Comment 2 Curt Arnold 2007-11-09 12:18:26 UTC
I don't know why this was just marked assigned.  I started writing a comment on this a week or so ago 
and got side tracked.  

The proposed solution would help their strongly discouraged case, but could break a legitimate case 
that depended on overwriting existing files.  If you wanted to keep hourly log files, but only until the 
next day (or similarly with daily files for a month, etc), you could use a pattern that suppresses the 
more significant elements of the time.  For example, if your pattern was hourly-log-%d(HH}.log, then 
rolling would discard yesterday's file from the same hour.

Since there is a valid use and existing users may depend on that behavior and the "fix" only temporarily 
improves life for people that are doing what is strongly discouraged, I don't see that proposed solution 
is acceptible.
Comment 3 hanif 2007-12-11 08:16:05 UTC
We have a similar situation. However, in our case its not a misconfiguration. 
We are using an Oracle application which runs on Websphere 5 under AIX 5. This 
combination of application server and operating system means we are only able 
to use a limited amount of heap (around 2Gb max). Therefore, we are forced to 
use multiple JVMs on each of the nodes (in a 2 node cluster). We are 
experiencing similar behaviour. However, we have observed the writing by the 
various JVMs to rolled-over files but I not sure that we have observed file 
truncation by overwriting.

It sounds very much like the exclusive file lock problem mentioned by Michael.

Regards
Hanif
Oracle IDM Developer
Comment 4 Michael Geiser 2007-12-11 11:06:48 UTC
Hanif:

In your case, the two JVMs and in the same node, in my case the two JVMs are 
in separate nodes, but the end result is the same; two instances of Jog4j are 
running in two JVMs trying to write to the same log file.  

I'm going to change the settings to append to existing logs instead of 
overwrite so my data is not destroyed. I still believe it is wrong that data 
is written to a previous - and wrong - interval log.  It should be straight 
forward to come up with a predicable behavior in this situation so that if a 
new file must be created, a log file monitor would also pick up the other file 
as well.


Comment 5 Curt Arnold 2007-12-11 11:41:21 UTC
When I said "misconfiguration", I did not mean that it was unintentional, but that it is was known to be 
problematic and should be avoided.  As far as I can tell, it can't be "fixed" (at least in a platform 
independent manner), so don't do it.

The writing-to-a-previous log file scenario is at the OS level.  If you take any program on a Unix that 
does file IO and have some other process mv or del the file that it is writing to, the program will continue 
to write to the file in its new location.  It is not informed that the directory entry pointing to the file has 
been modified.  The write-to-a-previous log file occurs when one JVM is currently writing to the current 
log file and then another JVM effectively mv's the file and creates a new file.
Comment 6 Chintaka Ranasinghe 2008-01-09 15:17:46 UTC
I noticed that the rollOver() method in RollingAppender class does not check for
the return value of  file.renameTo(target). Java doc for File.rename says that
the caller should always check the value to see if the operation was a
success.We noticed that rename method returns 'false' when appender tries to
write to a locked file.     
Comment 7 Curt Arnold 2008-01-17 09:59:44 UTC
For comment 6, could you confirm that you are looking at log4j 1.2.15 or later.  Bug 41735 addressed 
missing checks of the status of File.rename.
Comment 8 Thomas Weinschenk 2008-03-19 05:43:30 UTC
I have the same problem when two webapps (in Tomcat) using the same file. I am using the 1.2.15
Comment 9 Thorbjørn Ravn Andersen 2008-06-30 13:11:50 UTC
Is this the same underlying issue as reported in 44867 ?