Bug 1782 - Possible race condition in BayesStore
Summary: Possible race condition in BayesStore
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Learner (show other bugs)
Version: 2.53
Hardware: Other other
: P5 normal
Target Milestone: 2.54
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords: backport
Depends on:
Blocks:
 
Reported: 2003-04-14 14:40 UTC by Eugene Miretsky
Modified: 2003-05-06 21:38 UTC (History)
0 users



Attachment Type Modified Status Actions Submitter/CLA Status
fix this bug by using syswrite() instead of print(). also be paranoid and warn if syswrite() is only writing partially patch None Justin Mason [HasCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Eugene Miretsky 2003-04-14 14:40:39 UTC
BayesStore.pm incorrectly assumes that open for append in perl will
do file locking:

Quite from scan_count_increment and add_touches_to journal:

  # Use filesystem-level append operations.  These are very fast, and
  # on a local disk on UNIX at least, guaranteed not to overwrite another
  # process' changes.   ... etc

This is not correct.

A simple way to check this assumption is to run:
strace perl -e 'open(X, ">>/tmp/$$"); print X "."x8192 . "\n"; close(X);'

Here is example output (only relevant options):
open("/tmp/69050", O_WRONLY|O_APPEND|O_CREAT, 0666) = 3
lseek(3, 0, SEEK_END)                   = 0
fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
fcntl(3, F_SETFD, FD_CLOEXEC)           = 0
break(0x8109000)                        = 0
break(0x810c000)                        = 0
fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
break(0x810e000)                        = 0
write(3, "................................"..., 8192) = 8192
write(3, "\n", 1)                       = 1
close(3)                                = 0

As you can see, open is done with O_WRONLY|O_APPEND|O_CREAT bits set, without
O_EXCL flag, which means that if the process were to be interrupted 
just before first write() or interrupted after first write() (which can happen
if we have _a lot_ of data to write), data curruption may occur.

It's difficult but possible to recreate this race condition (I did it once, but
I will not try to do it again :) ).  

It seems that spamassassin will gracefuly handle giberish lines in bayes_jornal,
so this is not a major problem.  However, I think flocks must be added for all
open for append operations (there are couple places).
Comment 1 Antony Mawer 2003-04-14 17:40:43 UTC
Subject: Re: [SAdev]  New: Possible race condition in BayesStore 


> A simple way to check this assumption is to run:
> strace perl -e 'open(X, ">>/tmp/$$"); print X "."x8192 . "\n"; close(X);'
> 
> Here is example output (only relevant options):
> open("/tmp/69050", O_WRONLY|O_APPEND|O_CREAT, 0666) = 3
> lseek(3, 0, SEEK_END)                   = 0
> fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
> fcntl(3, F_SETFD, FD_CLOEXEC)           = 0
> break(0x8109000)                        = 0
> break(0x810c000)                        = 0
> fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
> break(0x810e000)                        = 0
> write(3, "................................"..., 8192) = 8192
> write(3, "\n", 1)                       = 1
> close(3)                                = 0

ick! that's bad behaviour by perl.

O_APPEND *should* mean that a write() will be atomic, and will not
be lost.  we don't care if *both* happen, as long as neither
are lost, and each write() operation does not intermingle with
the other.  That's what POSIX O_APPEND will guarantee.

Unfortunately perl is breaking it up into 2 write() operations!

So changing the stuff to

  $data = "foo".$bar."\n";
  print OUT $data;
  close OUT;

will not run into that problem I should think.


> It seems that spamassassin will gracefuly handle giberish lines in
> bayes_jorn al, so this is not a major problem.  However, I think flocks
> must be added for al l open for append operations (there are couple
> places).

That's exactly what the journal is intended to avoid ;)  Let's try to
avoid that...

--j.

Comment 2 Eugene Miretsky 2003-04-15 07:09:27 UTC
Subject: Re: [SAdev]  Possible race condition in BayesStore

On Mon, Apr 14, 2003 at 05:40:44PM -0700, bugzilla-daemon@hughes-family.org wrote:
> http://www.hughes-family.org/bugzilla/show_bug.cgi?id=1782
> 
> 
> 
> 
> 
> ------- Additional Comments From ajmawer@optusnet.com.au  2003-04-14 17:40 -------
> Subject: Re: [SAdev]  New: Possible race condition in BayesStore 
> 
> 
> > A simple way to check this assumption is to run:
> > strace perl -e 'open(X, ">>/tmp/$$"); print X "."x8192 . "\n"; close(X);'
> > 
> > Here is example output (only relevant options):
> > open("/tmp/69050", O_WRONLY|O_APPEND|O_CREAT, 0666) = 3
> > lseek(3, 0, SEEK_END)                   = 0
> > fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
> > fcntl(3, F_SETFD, FD_CLOEXEC)           = 0
> > break(0x8109000)                        = 0
> > break(0x810c000)                        = 0
> > fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
> > break(0x810e000)                        = 0
> > write(3, "................................"..., 8192) = 8192
> > write(3, "\n", 1)                       = 1
> > close(3)                                = 0
> 
> ick! that's bad behaviour by perl.
> 
> O_APPEND *should* mean that a write() will be atomic, and will not
> be lost.  we don't care if *both* happen, as long as neither
> are lost, and each write() operation does not intermingle with
> the other.  That's what POSIX O_APPEND will guarantee.
> 
> Unfortunately perl is breaking it up into 2 write() operations!

Unfortunately, it's not just perl, but standard C libraries on
both Linux (RH) and FreeBSD seem to do.

Here is a small C program to test this:

#include <stdio.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/uio.h>
#include <unistd.h>

int main(int argc, char** argv) {
  FILE *fp;
  char buf[8194];
  int f, i;

  for(i=0; i<8193; i++) {buf[i]='.';}
  buf[8193]=0;

  f = open("/tmp/x", O_CREAT|O_APPEND|O_WRONLY);
  write(f, buf, 8193);
  close(f);

  fp = fopen("/tmp/x1", "a");
  fprintf(fp, "%s", buf);
  fclose(fp);

}

Compile it and run it under strace.  Here is relevant output:
open("/tmp/x", O_WRONLY|O_APPEND|O_CREAT, 0) = 3
write(3, "................................"..., 8193) = 8193
close(3)                                = 0
open("/tmp/x1", O_WRONLY|O_APPEND|O_CREAT, 0666) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40014000
fstat64(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
_llseek(3, 0, [0], SEEK_SET)            = 0
write(3, "................................"..., 8192) = 8192
write(3, ".", 1)                        = 1
close(3)                                = 0

As you can see, low level IO does exactly what it's being asked -- i.e. writes
8193 bytes.  But, by going through stdio which uses buffer of 8192, it's clear
that seek is not done on every write -- so, it's not realy Perl behavior to
blame, but standard lib.

> 
> So changing the stuff to
> 
>   $data = "foo".$bar."\n";
>   print OUT $data;
>   close OUT;
> 
> will not run into that problem I should think.

I don't think that this will help.  Again, perl print uses stdio, which uses 
buffered io, which will not work correctly if we are writing a lot of data.

> 
> 
> > It seems that spamassassin will gracefuly handle giberish lines in
> > bayes_jorn al, so this is not a major problem.  However, I think flocks
> > must be added for al l open for append operations (there are couple
> > places).
> 
> That's exactly what the journal is intended to avoid ;)  Let's try to
> avoid that...
> 

Yes, that's what it tries to avoid, but it introduces a race condition :(

Comment 3 Antony Mawer 2003-04-15 11:59:32 UTC
Subject: Re: [SAdev]  Possible race condition in BayesStore 


> As you can see, low level IO does exactly what it's being asked -- i.e.
> write s 8193 bytes.  But, by going through stdio which uses buffer of
> 8192, it's clea r that seek is not done on every write -- so, it's not
> realy Perl behavior to blame, but standard lib.

ick *2!

I'd better run through the problem for background.
Basically, we do these write operations on the bayes_toks db:

  - write new token counts (from sa-learn or autolearn).

  - update access time on tokens used in scanning (required for
    useful expiry).

Since a busy site could have, let's say, 10 processes all scanning in
parallel, the latter operation cannot lock the bayes_toks db, since that
would serialize all 10 processes on the lock before a scan can complete.

Hence we use a journal for the latter, which is sync'ed up periodically
before the expiration takes place, or if the user runs sa-learn --rebuild.

The idea behind the journal was to:

  open (JOURNAL, ">>journal")
  write atomic data (a)
  close JOURNAL

As long as (a) is an atomic write and the semantics of >> are correctly
used, it will be fine.

Initially this used to be atomic, because it wrote only small data; it
looks like someone recently changed the code to marshal up a big chunk
and write it all at once, which is why this bug is appearing.

So I suggest we just go back to small one-line writes instead.

As explained though, *locking* the journal will be very bad news; we
basically cannot do that at the SpamAssassin level, since we'll lose
the entire benefit of the journal and slow big sites down massively.

--j.

Comment 4 Eugene Miretsky 2003-04-15 12:12:57 UTC
Subject: Re: [SAdev]  Possible race condition in BayesStore

On Tue, Apr 15, 2003 at 11:59:32AM -0700, bugzilla-daemon@hughes-family.org wrote:
> 
> Hence we use a journal for the latter, which is sync'ed up periodically
> before the expiration takes place, or if the user runs sa-learn --rebuild.
> 
> The idea behind the journal was to:
> 
>   open (JOURNAL, ">>journal")
>   write atomic data (a)
>   close JOURNAL
> 
> As long as (a) is an atomic write and the semantics of >> are correctly
> used, it will be fine.
> 
> Initially this used to be atomic, because it wrote only small data; it
> looks like someone recently changed the code to marshal up a big chunk
> and write it all at once, which is why this bug is appearing.

I never said that I saw this bug in action -- just that it's a potential bug :)

> 
> So I suggest we just go back to small one-line writes instead.
> 

That might be a good idea but it will work only if you use nonbuffered io.


> As explained though, *locking* the journal will be very bad news; we
> basically cannot do that at the SpamAssassin level, since we'll lose
> the entire benefit of the journal and slow big sites down massively.

Totaly agree on this issue.

Comment 5 Antony Mawer 2003-04-15 12:51:08 UTC
Subject: Re: [SAdev]  Possible race condition in BayesStore 


> I never said that I saw this bug in action -- just that it's a
> potential bug :)
phew, that's good news ;)

> > So I suggest we just go back to small one-line writes instead.
> That might be a good idea but it will work only if you use nonbuffered io.

Actually, there's another workaround that just hit me; use syswrite()
directly and avoid all use of "print" etc., so we don't pass through
stdio.   I'll fix it to do that for now.

Comment 6 Justin Mason 2003-04-15 13:03:42 UTC
Created attachment 886 [details]
fix this bug by using syswrite() instead of print(). also be paranoid and warn if syswrite() is only writing partially
Comment 7 Justin Mason 2003-04-15 13:04:51 UTC
OK, patch that fixes it by using syswrite() to call write(2) directly.
"backport" kw added, but I don't know how high-priority it is....
Comment 8 Eugene Miretsky 2003-04-15 14:45:50 UTC
Subject: Re: [SAdev]  Possible race condition in BayesStore

I think you need an additional 
  sysseek(OUT, SEEK_END, 0);

just after the while loop because if syswrite fails, you still
want to try to append to the end of the file on the next iteration.



On Tue, Apr 15, 2003 at 01:04:51PM -0700, bugzilla-daemon@hughes-family.org wrote:
> http://www.hughes-family.org/bugzilla/show_bug.cgi?id=1782
> 
> jm@jmason.org changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>            Keywords|                            |backport
>    Target Milestone|2.60                        |2.54
> 
> 
> 
> ------- Additional Comments From jm@jmason.org  2003-04-15 13:04 -------
> OK, patch that fixes it by using syswrite() to call write(2) directly.
> "backport" kw added, but I don't know how high-priority it is....
> 
> 
> 
> ------- You are receiving this mail because: -------
> You are the assignee for the bug, or are watching the assignee.
> 
> 
> -------------------------------------------------------
> This sf.net email is sponsored by:ThinkGeek
> Welcome to geek heaven.
> http://thinkgeek.com/sf
> _______________________________________________
> Spamassassin-devel mailing list
> Spamassassin-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/spamassassin-devel

Comment 9 Antony Mawer 2003-04-15 19:05:33 UTC
Subject: Re: [SAdev]  Possible race condition in BayesStore 


> I think you need an additional 
>   sysseek(OUT, SEEK_END, 0);
> just after the while loop because if syswrite fails, you still
> want to try to append to the end of the file on the next iteration.

I don't think so... open(2) manpage states:

   O_APPEND
	  The  file  is opened in append mode. Before each write, the file
	  pointer is positioned at the end of the file, as if with  lseek.

so it's implicit.   (that strace did indicate that perl was correctly
using O_APPEND iirc.)

--j.

Comment 10 Eugene Miretsky 2003-04-16 04:23:11 UTC
Subject: Re: [SAdev]  Possible race condition in BayesStore

On Tue, Apr 15, 2003 at 07:04:33PM -0700, Justin Mason wrote:
> 
> > I think you need an additional 
> >   sysseek(OUT, SEEK_END, 0);
> > just after the while loop because if syswrite fails, you still
> > want to try to append to the end of the file on the next iteration.
> 
> I don't think so... open(2) manpage states:
> 
>    O_APPEND
> 	  The  file  is opened in append mode. Before each write, the file
> 	  pointer is positioned at the end of the file, as if with  lseek.
> 
> so it's implicit.   (that strace did indicate that perl was correctly
> using O_APPEND iirc.)

Yes, lseek is done after open.  But what happens if you fail with your syswrite
and write partial data?  Then you'll go back to the while loop, your program 
gets interrupted while other process writes some more data, which you will cobber
because your file descriptor no longer points to the end of the file...

I'm pretty sure you need sysseek after while loop().

> 
> --j.

Comment 11 Theo Van Dinter 2003-05-03 15:01:50 UTC
OKAY: the patch looks good to me, although I'm kind of upset we have to jump 
through the hoop in the first place ...
Comment 12 Theo Van Dinter 2003-05-07 05:38:54 UTC
applied to 2.54.  closing.