SA Bugzilla – Bug 2114
sa-learn OOMing on many large mboxes
Last modified: 2004-09-28 10:16:45 UTC
# ls -la TSPAM0* -rw-r--r-- 1 antivirussystem 13845896 Jun 21 14:04 TSPAM0204 -rw-r--r-- 1 antivirussystem 24520587 Jun 21 14:12 TSPAM0205 -rw-r--r-- 1 antivirussystem 21100203 Jun 21 14:32 TSPAM0206 -rw-r--r-- 1 antivirussystem 15859002 Jun 21 14:31 TSPAM0207 -rw-r--r-- 1 antivirussystem 35457339 Jun 21 14:35 TSPAM0208 -rw-r--r-- 1 antivirussystem 36673111 Jun 21 14:36 TSPAM0209 -rw-r--r-- 1 antivirussystem 42233387 Jun 21 14:37 TSPAM0210 -rw-r--r-- 1 antivirussystem 51930254 Jun 21 14:39 TSPAM0211 -rw-r--r-- 1 antivirussystem 67869903 Jun 21 14:41 TSPAM0212 -rw-r--r-- 1 antivirussystem 55532171 Jun 21 14:42 TSPAM0301 -rw-r--r-- 1 antivirussystem 54315412 Jun 21 14:44 TSPAM0302 -rw-r--r-- 1 antivirussystem 50897005 Jun 21 14:48 TSPAM0303 -rw-r--r-- 1 antivirussystem 54452400 Jun 21 14:56 TSPAM0304 -rw-r--r-- 1 antivirussystem 53019445 Jun 21 15:02 TSPAM0305 # /usr/local/bin/sa-learn --spam --mbox --C /etc/mail/spamassassin/user_prefs TSPAM0* Learned from (cant remember how many) messages Out of memory! # ls -la bayes/ total 36956 drwxr-xr-x 2 antivirussystem 8192 Jun 22 08:37 . drwxrwxrwx 3 antivirussystem 8192 Jun 22 08:47 .. -rw------- 1 antivirussystem 13 Jun 22 08:37 .lock -rw-rw-rw- 1 antivirussystem 2453 Jun 22 08:37 _msgcount -rw-rw-rw- 1 antivirussystem 5136384 Jun 22 08:37 _seen -rw-rw-rw- 1 antivirussystem 42049536 Jun 22 08:37 _toks -rw-rw-rw- 1 antivirussystem 32768 Jun 21 19:18 _toks.new ------------------ I want to add in more messages, but sa-learn could not add in more.. and only groan with "Out of memory!" whereas spamprobe can add in more messages [antivirus .spamprobe]$ ls -la total 63108 drwxrwxr-x 2 antivirus system 4096 Jun 6 10:54 ./ drwx------ 20 antivirus system 4096 Jun 12 16:47 ../ -rw-rw-r-- 1 antivirus system 0 May 16 13:54 lock -rw------- 1 antivirus system 64544768 Jun 22 06:41 sp_words ------------------ Can fix the memory problem?
could you attach -D output from sa-learn?
$ sa-learn --ham --mbox SPAMNOT -D >&results.txt ...SNIP... debug: bayes: Current scan count 77548 > 65535, resetting atimes to 0 Out of memory! Learned from 31 messages.
Subject: Re: sa-learn OOMing on many large mboxes -------- $ sa-learn --ham --mbox SPAMNOT -D >&results.txt ...SNIP... debug: bayes: Current scan count 77548 > 65535, resetting atimes to 0 Out of memory! Learned from 31 messages. ------- Has posted and emailed here @ same time. Phil. ----- Original Message ----- From: <bugzilla-daemon@bugzilla.spamassassin.org> To: <plchoy@income.com.sg> Sent: Wednesday, July 02, 2003 3:12 AM Subject: [Bug 2114] sa-learn OOMing on many large mboxes > http://bugzilla.spamassassin.org/show_bug.cgi?id=2114 > > jm@jmason.org changed: > > What |Removed |Added > -------------------------------------------------------------------------- -- > Summary|Out of Memory! |sa-learn OOMing on many > | |large mboxes > > > > ------- Additional Comments From jm@jmason.org 2003-07-01 12:10 ------- > could you attach -D output from sa-learn? > > > > ------- You are receiving this mail because: ------- > You reported the bug, or are watching the reporter.
Subject: Re: [SAdev] sa-learn OOMing on many large mboxes >...SNIP... could you post this bit too? ;)
actually, I see why you're SNIPping. there's a lot of tokens there. No need to post those, just SNIP out the "token" lines -- it's the "ntokens" etc. data that's important. BTW, what are the memory limits on the machine? It sounds like it's reading the entire db into RAM to perform the upgrade process, and that's the problem.
Just to note: debug: bayes: Current scan count 77548 > 65535, resetting atimes to 0 is not an upgrade. It's the expire hack to fix the 16-bit atimes. During the reset, the old tokens are sent directly into the new db. However, there is a 'foreach my $tok (keys %{$self->{db_toks}}) {', and that will cause a temp array of all the tokens. So ...
Subject: Re: sa-learn OOMing on many large mboxes Never mind.. Posting entire attachment instead. Using dual Alpha CPUs with 2GB RAM. Thanx. Phil. ----- Original Message ----- From: <bugzilla-daemon@bugzilla.spamassassin.org> To: <plchoy@income.com.sg> Sent: Wednesday, July 02, 2003 9:46 AM Subject: [Bug 2114] sa-learn OOMing on many large mboxes > http://bugzilla.spamassassin.org/show_bug.cgi?id=2114 > > > > > > ------- Additional Comments From jm@jmason.org 2003-07-01 18:44 ------- > actually, I see why you're SNIPping. there's a lot of tokens there. No > need to post those, just SNIP out the "token" lines -- it's the "ntokens" etc. > data that's important. > > BTW, what are the memory limits on the machine? It sounds like it's reading > the entire db into RAM to perform the upgrade process, and that's the problem. > > > > ------- You are receiving this mail because: ------- > You reported the bug, or are watching the reporter.
Subject: Re: sa-learn OOMing on many large mboxes Doh.. Don't U get attachment results.txt here? Phil. ----- Original Message ----- From: <bugzilla-daemon@bugzilla.spamassassin.org> To: <plchoy@income.com.sg> Sent: Wednesday, July 02, 2003 10:30 AM Subject: [Bug 2114] sa-learn OOMing on many large mboxes > http://bugzilla.spamassassin.org/show_bug.cgi?id=2114 > > > > > > ------- Additional Comments From plchoy@income.com.sg 2003-07-01 19:28 ------- > Subject: Re: sa-learn OOMing on many large mboxes > > Never mind.. Posting entire attachment instead. > > Using dual Alpha CPUs with 2GB RAM. > > Thanx. > > Phil. > > ----- Original Message ----- > From: <bugzilla-daemon@bugzilla.spamassassin.org> > To: <plchoy@income.com.sg> > Sent: Wednesday, July 02, 2003 9:46 AM > Subject: [Bug 2114] sa-learn OOMing on many large mboxes > > > > http://bugzilla.spamassassin.org/show_bug.cgi?id=2114 > > > > > > > > > > > > ------- Additional Comments From jm@jmason.org 2003-07-01 18:44 ------- > > actually, I see why you're SNIPping. there's a lot of tokens there. No > > need to post those, just SNIP out the "token" lines -- it's the "ntokens" > etc. > > data that's important. > > > > BTW, what are the memory limits on the machine? It sounds like it's > reading > > the entire db into RAM to perform the upgrade process, and that's the > problem. > > > > > > > > ------- You are receiving this mail because: ------- > > You reported the bug, or are watching the reporter. > > > > > ------- You are receiving this mail because: ------- > You reported the bug, or are watching the reporter.
Subject: Re: sa-learn OOMing on many large mboxes Dont know if attachment had been posted.. So, i m doing copying and paste.. here.. Hope it helps.. ======================= debug: Score set 0 chosen. debug: running in taint mode? no debug: using "/usr/local/share/spamassassin" for default rules dir debug: using "/etc/mail/spamassassin" for site rules dir debug: using "/home/av/.spamassassin/user_prefs" for user prefs file debug: bayes: 396824 tie-ing to DB file R/O /home/av/mail/bayes2/_toks debug: bayes: 396824 tie-ing to DB file R/O /home/av/mail/bayes2/_seen debug: debug: Only 26 ham(s) in Bayes DB < 200 debug: bayes: 396824 untie-ing debug: bayes: 396824 untie-ing db_toks debug: bayes: 396824 untie-ing db_seen debug: Score set 0 chosen. debug: Initialising learner debug: Initialising learner debug: Learning Ham debug: bayes: 396824 untie-ing debug: lock: 396824 created /home/av/mail/bayes2/.lock.vxfx.396824 debug: lock: 396824 trying to get lock on /home/av/mail/bayes2/ with 0 retries debug: lock: 396824 link to /home/av/mail/bayes2/.lock: link ok debug: bayes: 396824 tie-ing to DB file R/W /home/av/mail/bayes2/_toks debug: bayes: 396824 tie-ing to DB file R/W /home/av/mail/bayes2/_seen debug: 1056283230@vxfx.income.com.sg: already learnt correctly, not learning twice debug: Removing Markup debug: Learning Ham debug: bayes: 396824 untie-ing debug: bayes: 396824 untie-ing db_toks debug: bayes: 396824 untie-ing db_seen debug: bayes: files locked, now unlocking lock debug: unlock: 396824 unlink /home/av/mail/bayes2/.lock debug: lock: 396824 created /home/av/mail/bayes2/.lock.vxfx.396824 debug: lock: 396824 trying to get lock on /home/av/mail/bayes2/ with 0 retries debug: lock: 396824 link to /home/av/mail/bayes2/.lock: link ok debug: bayes: 396824 tie-ing to DB file R/W /home/av/mail/bayes2/_toks debug: bayes: 396824 tie-ing to DB file R/W /home/av/mail/bayes2/_seen debug: 200306220651.h5M6p9xH369368@vxfx.income.com.sg: already learnt correctly, not learning twice debug: Removing Markup debug: Learning Ham debug: bayes: 396824 untie-ing debug: bayes: 396824 untie-ing db_toks debug: bayes: 396824 untie-ing db_seen debug: bayes: files locked, now unlocking lock debug: unlock: 396824 unlink /home/av/mail/bayes2/.lock debug: lock: 396824 created /home/av/mail/bayes2/.lock.vxfx.396824 debug: lock: 396824 trying to get lock on /home/av/mail/bayes2/ with 0 retries debug: lock: 396824 link to /home/av/mail/bayes2/.lock: link ok debug: bayes: 396824 tie-ing to DB file R/W /home/av/mail/bayes2/_toks debug: bayes: 396824 tie-ing to DB file R/W /home/av/mail/bayes2/_seen debug: 200306220300.h5M306xH323036@vxfx.income.com.sg: already learnt correctly, not learning twice debug: Removing Markup debug: Learning Ham debug: bayes: 396824 untie-ing debug: bayes: 396824 untie-ing db_toks debug: bayes: 396824 untie-ing db_seen debug: bayes: files locked, now unlocking lock debug: unlock: 396824 unlink /home/av/mail/bayes2/.lock debug: lock: 396824 created /home/av/mail/bayes2/.lock.vxfx.396824 debug: lock: 396824 trying to get lock on /home/av/mail/bayes2/ with 0 retries debug: lock: 396824 link to /home/av/mail/bayes2/.lock: link ok debug: bayes: 396824 tie-ing to DB file R/W /home/av/mail/bayes2/_toks debug: bayes: 396824 tie-ing to DB file R/W /home/av/mail/bayes2/_seen debug: 200306220312.h5M3CcxH306926@vxfx.income.com.sg: already learnt correctly, not learning twice debug: Removing Markup .... SNIP ... debug: Learning Ham debug: bayes: 396824 untie-ing debug: bayes: 396824 untie-ing db_toks debug: bayes: 396824 untie-ing db_seen debug: bayes: files locked, now unlocking lock debug: unlock: 396824 unlink /home/av/mail/bayes2/.lock debug: lock: 396824 created /home/av/mail/bayes2/.lock.vxfx.396824 debug: lock: 396824 trying to get lock on /home/av/mail/bayes2/ with 0 retries debug: lock: 396824 link to /home/av/mail/bayes2/.lock: link ok debug: bayes: 396824 tie-ing to DB file R/W /home/av/mail/bayes2/_toks debug: bayes: 396824 tie-ing to DB file R/W /home/av/mail/bayes2/_seen debug: tokenize: header tokens for *p = "<updates@lolfun.com>" debug: tokenize: header tokens for *m = " 200306271322 h5RDLxGs110315 vxfx income com sg " debug: tokenize: header tokens for *F = "LOLFUN <updates@lolfun.com>" debug: tokenize: header tokens for To = "eloke@income.com.sg" debug: tokenize: header tokens for X-CID = "8777838" debug: tokenize: header tokens for X-ENVID = "FGC-30641" debug: tokenize: header tokens for X-Keywords = "" debug: tokenize: header tokens for *r = " MAIL3065 (127.0.0) by MAIL3065.flowgo.com (PowerMTA(TM) v1.5); envelope- <updates@lolfun.com>)" debug: tokenize: header tokens for *r = " MAIL3065 (127.0.0) by MAIL3065.flowgo.com (PowerMTA(TM) v1.5); envelope- <updates@lolfun.com>) MAIL3065.flowgo.com (ma il3065.flowgo.com [12.129.205]) by vxfx.income.com.sg (8.12.9/8.11.6) <eloke@income.com.sg>; " debug: bayes: Current scan count 77548 > 65535, resetting atimes to 0 Out of memory! Learned from 31 messages. ----- Original Message ----- From: <bugzilla-daemon@bugzilla.spamassassin.org> To: <plchoy@income.com.sg> Sent: Wednesday, July 02, 2003 10:30 AM Subject: [Bug 2114] sa-learn OOMing on many large mboxes > http://bugzilla.spamassassin.org/show_bug.cgi?id=2114 > > > > > > ------- Additional Comments From plchoy@income.com.sg 2003-07-01 19:28 ------- > Subject: Re: sa-learn OOMing on many large mboxes > > Never mind.. Posting entire attachment instead. > > Using dual Alpha CPUs with 2GB RAM. > > Thanx. > > Phil. > > ----- Original Message ----- > From: <bugzilla-daemon@bugzilla.spamassassin.org> > To: <plchoy@income.com.sg> > Sent: Wednesday, July 02, 2003 9:46 AM > Subject: [Bug 2114] sa-learn OOMing on many large mboxes > > > > http://bugzilla.spamassassin.org/show_bug.cgi?id=2114 > > > > > > > > > > > > ------- Additional Comments From jm@jmason.org 2003-07-01 18:44 ------- > > actually, I see why you're SNIPping. there's a lot of tokens there. No > > need to post those, just SNIP out the "token" lines -- it's the "ntokens" > etc. > > data that's important. > > > > BTW, what are the memory limits on the machine? It sounds like it's > reading > > the entire db into RAM to perform the upgrade process, and that's the > problem. > > > > > > > > ------- You are receiving this mail because: ------- > > You reported the bug, or are watching the reporter. > > > > > ------- You are receiving this mail because: ------- > You reported the bug, or are watching the reporter.
assigning to 2.61
*** Bug 2232 has been marked as a duplicate of this bug. ***
Can you please try out 2.60 and see if you have the same issue? The Bayes backend was overhauled, so you will hopefully not have any problems. Moving to 2.70
sa-learn --nonspam on a 1,100 message mailbox just OOMed my 512MB RAM 1GB swap dual athlon. This is with a new SpamAssassin 2.6.1.
ok, let's step back a second... there are 2 issues here as far as I can see: 1) sa-learn mode in ArchiveIterator causes the list of messages to use to be stored in memory ... So sa-learn first goes through all the files, finds the offsets for each message, and stores that in memory plus some metadata. So if you're learning a huge # of messages, that can be pretty big. 2) expire/update and the various hack atime reset code were using foreach() in earlier versions of SA, which caused (iirc) a temp array to be created in memory with the full list of tokens in the DB, which can be pretty big too. #1 still happens, but you don't seem to be getting OOMed at that stage. #2's expiry/etc issue was fixed somewhere in the 2.6x series (2.62 I think), and the atime hackery is no longer done post 2.5x. So I guess what I'm trying to get to is this: Does the problem still occur with 2.63? If you can, does it occur with 3.0.0? If it still occurs, we'll have to do a bit more debugging. #1 is easily solvable, at least on UNIX -- we already do it for mass-check runs (store the list of message offsets in a temp file and do some fork() fun to avoid the memory hit over the learning period). It's sort of a hack, which is why it's not the default for sa-learn. But I'm guessing this isn't your problem. For me, the storage for ~200k message offsets only adds ~60MB to the process memory usage in testing/mass-check. If #2 is occurring... more debugging. ;)
bugs for 3.0.1 milestone
not hearing anything in 7 months, I'm closing the ticket as wfm. if the problem still occurs with a released 3.0 version, we can reopen and investigate.