Bug 2114 - sa-learn OOMing on many large mboxes
Summary: sa-learn OOMing on many large mboxes
Status: RESOLVED WORKSFORME
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Learner (show other bugs)
Version: 2.55
Hardware: DEC OSF/1
: P5 normal
Target Milestone: 3.0.1
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
: 2232 (view as bug list)
Depends on:
Blocks:
 
Reported: 2003-06-21 17:47 UTC by Philip Choy
Modified: 2004-09-28 10:16 UTC (History)
1 user (show)



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Philip Choy 2003-06-21 17:47:49 UTC
# ls -la TSPAM0*
-rw-r--r--   1 antivirussystem   13845896 Jun 21 14:04 TSPAM0204
-rw-r--r--   1 antivirussystem   24520587 Jun 21 14:12 TSPAM0205
-rw-r--r--   1 antivirussystem   21100203 Jun 21 14:32 TSPAM0206
-rw-r--r--   1 antivirussystem   15859002 Jun 21 14:31 TSPAM0207
-rw-r--r--   1 antivirussystem   35457339 Jun 21 14:35 TSPAM0208
-rw-r--r--   1 antivirussystem   36673111 Jun 21 14:36 TSPAM0209
-rw-r--r--   1 antivirussystem   42233387 Jun 21 14:37 TSPAM0210
-rw-r--r--   1 antivirussystem   51930254 Jun 21 14:39 TSPAM0211
-rw-r--r--   1 antivirussystem   67869903 Jun 21 14:41 TSPAM0212
-rw-r--r--   1 antivirussystem   55532171 Jun 21 14:42 TSPAM0301
-rw-r--r--   1 antivirussystem   54315412 Jun 21 14:44 TSPAM0302
-rw-r--r--   1 antivirussystem   50897005 Jun 21 14:48 TSPAM0303
-rw-r--r--   1 antivirussystem   54452400 Jun 21 14:56 TSPAM0304
-rw-r--r--   1 antivirussystem   53019445 Jun 21 15:02 TSPAM0305
# /usr/local/bin/sa-learn --spam --mbox --C /etc/mail/spamassassin/user_prefs 
TSPAM0*
Learned from (cant remember how many) messages
Out of memory!

# ls -la bayes/
total 36956
drwxr-xr-x   2 antivirussystem       8192 Jun 22 08:37 .
drwxrwxrwx   3 antivirussystem       8192 Jun 22 08:47 ..
-rw-------   1 antivirussystem         13 Jun 22 08:37 .lock
-rw-rw-rw-   1 antivirussystem       2453 Jun 22 08:37 _msgcount
-rw-rw-rw-   1 antivirussystem    5136384 Jun 22 08:37 _seen
-rw-rw-rw-   1 antivirussystem   42049536 Jun 22 08:37 _toks
-rw-rw-rw-   1 antivirussystem      32768 Jun 21 19:18 _toks.new

------------------

I want to add in more messages, but sa-learn could not add in more.. and only 
groan with "Out of memory!"

whereas spamprobe can add in more messages
[antivirus .spamprobe]$ ls -la
total 63108
drwxrwxr-x    2 antivirus system      4096 Jun  6 10:54 ./
drwx------   20 antivirus system      4096 Jun 12 16:47 ../
-rw-rw-r--    1 antivirus system         0 May 16 13:54 lock
-rw-------    1 antivirus system  64544768 Jun 22 06:41 sp_words
------------------

Can fix the memory problem?
Comment 1 Justin Mason 2003-07-01 12:10:01 UTC
could you attach -D output from sa-learn?
Comment 2 Philip Choy 2003-07-01 18:16:16 UTC
$ sa-learn --ham --mbox SPAMNOT -D >&results.txt

...SNIP...

debug: bayes: Current scan count 77548 > 65535, resetting atimes to 0
Out of memory!
Learned from 31 messages.
Comment 3 Philip Choy 2003-07-01 18:16:56 UTC
Subject: Re:  sa-learn OOMing on many large mboxes

--------
$ sa-learn --ham --mbox SPAMNOT -D >&results.txt

...SNIP...

debug: bayes: Current scan count 77548 > 65535, resetting atimes to 0
Out of memory!
Learned from 31 messages.
-------

Has posted and emailed here @ same time.

Phil.

----- Original Message ----- 
From: <bugzilla-daemon@bugzilla.spamassassin.org>
To: <plchoy@income.com.sg>
Sent: Wednesday, July 02, 2003 3:12 AM
Subject: [Bug 2114] sa-learn OOMing on many large mboxes


> http://bugzilla.spamassassin.org/show_bug.cgi?id=2114
>
> jm@jmason.org changed:
>
>            What    |Removed                     |Added
> --------------------------------------------------------------------------
--
>             Summary|Out of Memory!              |sa-learn OOMing on many
>                    |                            |large mboxes
>
>
>
> ------- Additional Comments From jm@jmason.org  2003-07-01 12:10 -------
> could you attach -D output from sa-learn?
>
>
>
> ------- You are receiving this mail because: -------
> You reported the bug, or are watching the reporter.

Comment 4 Justin Mason 2003-07-01 18:42:03 UTC
Subject: Re: [SAdev]  sa-learn OOMing on many large mboxes 


>...SNIP...

could you post this bit too? ;)

Comment 5 Justin Mason 2003-07-01 18:44:43 UTC
actually, I see why you're SNIPping.   there's a lot of tokens there.  No
need to post those, just SNIP out the "token" lines -- it's the "ntokens" etc.
data that's important.

BTW, what are the memory limits on the machine?  It sounds like it's reading
the entire db into RAM to perform the upgrade process, and that's the problem.
Comment 6 Theo Van Dinter 2003-07-01 18:55:07 UTC
Just to note:

debug: bayes: Current scan count 77548 > 65535, resetting atimes to 0

is not an upgrade.  It's the expire hack to fix the 16-bit atimes.

During the reset, the old tokens are sent directly into the new db.  However, there is a 'foreach my 
$tok (keys %{$self->{db_toks}}) {', and that will cause a temp array of all the tokens.  So ...
Comment 7 Philip Choy 2003-07-01 19:28:25 UTC
Subject: Re:  sa-learn OOMing on many large mboxes

Never mind.. Posting entire attachment instead.

Using dual Alpha CPUs with 2GB RAM.

Thanx.

Phil.

----- Original Message ----- 
From: <bugzilla-daemon@bugzilla.spamassassin.org>
To: <plchoy@income.com.sg>
Sent: Wednesday, July 02, 2003 9:46 AM
Subject: [Bug 2114] sa-learn OOMing on many large mboxes


> http://bugzilla.spamassassin.org/show_bug.cgi?id=2114
>
>
>
>
>
> ------- Additional Comments From jm@jmason.org  2003-07-01 18:44 -------
> actually, I see why you're SNIPping.   there's a lot of tokens there.  No
> need to post those, just SNIP out the "token" lines -- it's the "ntokens"
etc.
> data that's important.
>
> BTW, what are the memory limits on the machine?  It sounds like it's
reading
> the entire db into RAM to perform the upgrade process, and that's the
problem.
>
>
>
> ------- You are receiving this mail because: -------
> You reported the bug, or are watching the reporter.
Comment 8 Philip Choy 2003-07-01 20:04:21 UTC
Subject: Re:  sa-learn OOMing on many large mboxes

Doh.. Don't U get attachment results.txt here?

Phil.
----- Original Message ----- 
From: <bugzilla-daemon@bugzilla.spamassassin.org>
To: <plchoy@income.com.sg>
Sent: Wednesday, July 02, 2003 10:30 AM
Subject: [Bug 2114] sa-learn OOMing on many large mboxes


> http://bugzilla.spamassassin.org/show_bug.cgi?id=2114
>
>
>
>
>
> ------- Additional Comments From plchoy@income.com.sg  2003-07-01
19:28 -------
> Subject: Re:  sa-learn OOMing on many large mboxes
>
> Never mind.. Posting entire attachment instead.
>
> Using dual Alpha CPUs with 2GB RAM.
>
> Thanx.
>
> Phil.
>
> ----- Original Message ----- 
> From: <bugzilla-daemon@bugzilla.spamassassin.org>
> To: <plchoy@income.com.sg>
> Sent: Wednesday, July 02, 2003 9:46 AM
> Subject: [Bug 2114] sa-learn OOMing on many large mboxes
>
>
> > http://bugzilla.spamassassin.org/show_bug.cgi?id=2114
> >
> >
> >
> >
> >
> > ------- Additional Comments From jm@jmason.org  2003-07-01 18:44 -------
> > actually, I see why you're SNIPping.   there's a lot of tokens there.
No
> > need to post those, just SNIP out the "token" lines -- it's the
"ntokens"
> etc.
> > data that's important.
> >
> > BTW, what are the memory limits on the machine?  It sounds like it's
> reading
> > the entire db into RAM to perform the upgrade process, and that's the
> problem.
> >
> >
> >
> > ------- You are receiving this mail because: -------
> > You reported the bug, or are watching the reporter.
>
>
>
>
> ------- You are receiving this mail because: -------
> You reported the bug, or are watching the reporter.

Comment 9 Philip Choy 2003-07-01 22:19:10 UTC
Subject: Re:  sa-learn OOMing on many large mboxes

Dont know if attachment had been posted.. So, i m doing copying and paste..
here..

Hope it helps..

=======================
debug: Score set 0 chosen.
debug: running in taint mode? no
debug: using "/usr/local/share/spamassassin" for default rules dir
debug: using "/etc/mail/spamassassin" for site rules dir
debug: using "/home/av/.spamassassin/user_prefs" for user prefs file
debug: bayes: 396824 tie-ing to DB file R/O /home/av/mail/bayes2/_toks
debug: bayes: 396824 tie-ing to DB file R/O /home/av/mail/bayes2/_seen
debug: debug: Only 26 ham(s) in Bayes DB < 200
debug: bayes: 396824 untie-ing
debug: bayes: 396824 untie-ing db_toks
debug: bayes: 396824 untie-ing db_seen
debug: Score set 0 chosen.
debug: Initialising learner
debug: Initialising learner
debug: Learning Ham
debug: bayes: 396824 untie-ing
debug: lock: 396824 created /home/av/mail/bayes2/.lock.vxfx.396824
debug: lock: 396824 trying to get lock on /home/av/mail/bayes2/ with 0
retries
debug: lock: 396824 link to /home/av/mail/bayes2/.lock: link ok
debug: bayes: 396824 tie-ing to DB file R/W /home/av/mail/bayes2/_toks
debug: bayes: 396824 tie-ing to DB file R/W /home/av/mail/bayes2/_seen
debug: 1056283230@vxfx.income.com.sg: already learnt correctly, not learning
twice
debug: Removing Markup
debug: Learning Ham
debug: bayes: 396824 untie-ing
debug: bayes: 396824 untie-ing db_toks
debug: bayes: 396824 untie-ing db_seen
debug: bayes: files locked, now unlocking lock
debug: unlock: 396824 unlink /home/av/mail/bayes2/.lock
debug: lock: 396824 created /home/av/mail/bayes2/.lock.vxfx.396824
debug: lock: 396824 trying to get lock on /home/av/mail/bayes2/ with 0
retries
debug: lock: 396824 link to /home/av/mail/bayes2/.lock: link ok
debug: bayes: 396824 tie-ing to DB file R/W /home/av/mail/bayes2/_toks
debug: bayes: 396824 tie-ing to DB file R/W /home/av/mail/bayes2/_seen
debug: 200306220651.h5M6p9xH369368@vxfx.income.com.sg: already learnt
correctly, not learning twice
debug: Removing Markup
debug: Learning Ham
debug: bayes: 396824 untie-ing
debug: bayes: 396824 untie-ing db_toks
debug: bayes: 396824 untie-ing db_seen
debug: bayes: files locked, now unlocking lock
debug: unlock: 396824 unlink /home/av/mail/bayes2/.lock
debug: lock: 396824 created /home/av/mail/bayes2/.lock.vxfx.396824
debug: lock: 396824 trying to get lock on /home/av/mail/bayes2/ with 0
retries
debug: lock: 396824 link to /home/av/mail/bayes2/.lock: link ok
debug: bayes: 396824 tie-ing to DB file R/W /home/av/mail/bayes2/_toks
debug: bayes: 396824 tie-ing to DB file R/W /home/av/mail/bayes2/_seen
debug: 200306220300.h5M306xH323036@vxfx.income.com.sg: already learnt
correctly, not learning twice
debug: Removing Markup
debug: Learning Ham
debug: bayes: 396824 untie-ing
debug: bayes: 396824 untie-ing db_toks
debug: bayes: 396824 untie-ing db_seen
debug: bayes: files locked, now unlocking lock
debug: unlock: 396824 unlink /home/av/mail/bayes2/.lock
debug: lock: 396824 created /home/av/mail/bayes2/.lock.vxfx.396824
debug: lock: 396824 trying to get lock on /home/av/mail/bayes2/ with 0
retries
debug: lock: 396824 link to /home/av/mail/bayes2/.lock: link ok
debug: bayes: 396824 tie-ing to DB file R/W /home/av/mail/bayes2/_toks
debug: bayes: 396824 tie-ing to DB file R/W /home/av/mail/bayes2/_seen
debug: 200306220312.h5M3CcxH306926@vxfx.income.com.sg: already learnt
correctly, not learning twice
debug: Removing Markup

.... SNIP ...

debug: Learning Ham
debug: bayes: 396824 untie-ing
debug: bayes: 396824 untie-ing db_toks
debug: bayes: 396824 untie-ing db_seen
debug: bayes: files locked, now unlocking lock
debug: unlock: 396824 unlink /home/av/mail/bayes2/.lock
debug: lock: 396824 created /home/av/mail/bayes2/.lock.vxfx.396824
debug: lock: 396824 trying to get lock on /home/av/mail/bayes2/ with 0
retries
debug: lock: 396824 link to /home/av/mail/bayes2/.lock: link ok
debug: bayes: 396824 tie-ing to DB file R/W /home/av/mail/bayes2/_toks
debug: bayes: 396824 tie-ing to DB file R/W /home/av/mail/bayes2/_seen
debug: tokenize: header tokens for *p = "<updates@lolfun.com>"
debug: tokenize: header tokens for *m = " 200306271322 h5RDLxGs110315 vxfx
income com sg "
debug: tokenize: header tokens for *F = "LOLFUN <updates@lolfun.com>"
debug: tokenize: header tokens for To = "eloke@income.com.sg"
debug: tokenize: header tokens for X-CID = "8777838"
debug: tokenize: header tokens for X-ENVID = "FGC-30641"
debug: tokenize: header tokens for X-Keywords = ""
debug: tokenize: header tokens for *r = "  MAIL3065 (127.0.0) by
MAIL3065.flowgo.com (PowerMTA(TM) v1.5); envelope-  <updates@lolfun.com>)"
debug: tokenize: header tokens for *r = "  MAIL3065 (127.0.0) by
MAIL3065.flowgo.com (PowerMTA(TM) v1.5); envelope-  <updates@lolfun.com>)
MAIL3065.flowgo.com (ma
il3065.flowgo.com [12.129.205]) by vxfx.income.com.sg (8.12.9/8.11.6)
<eloke@income.com.sg>; "
debug: bayes: Current scan count 77548 > 65535, resetting atimes to 0
Out of memory!
Learned from 31 messages.



----- Original Message ----- 
From: <bugzilla-daemon@bugzilla.spamassassin.org>
To: <plchoy@income.com.sg>
Sent: Wednesday, July 02, 2003 10:30 AM
Subject: [Bug 2114] sa-learn OOMing on many large mboxes


> http://bugzilla.spamassassin.org/show_bug.cgi?id=2114
>
>
>
>
>
> ------- Additional Comments From plchoy@income.com.sg  2003-07-01
19:28 -------
> Subject: Re:  sa-learn OOMing on many large mboxes
>
> Never mind.. Posting entire attachment instead.
>
> Using dual Alpha CPUs with 2GB RAM.
>
> Thanx.
>
> Phil.
>
> ----- Original Message ----- 
> From: <bugzilla-daemon@bugzilla.spamassassin.org>
> To: <plchoy@income.com.sg>
> Sent: Wednesday, July 02, 2003 9:46 AM
> Subject: [Bug 2114] sa-learn OOMing on many large mboxes
>
>
> > http://bugzilla.spamassassin.org/show_bug.cgi?id=2114
> >
> >
> >
> >
> >
> > ------- Additional Comments From jm@jmason.org  2003-07-01 18:44 -------
> > actually, I see why you're SNIPping.   there's a lot of tokens there.
No
> > need to post those, just SNIP out the "token" lines -- it's the
"ntokens"
> etc.
> > data that's important.
> >
> > BTW, what are the memory limits on the machine?  It sounds like it's
> reading
> > the entire db into RAM to perform the upgrade process, and that's the
> problem.
> >
> >
> >
> > ------- You are receiving this mail because: -------
> > You reported the bug, or are watching the reporter.
>
>
>
>
> ------- You are receiving this mail because: -------
> You reported the bug, or are watching the reporter.

Comment 10 Daniel Quinlan 2003-08-21 23:39:37 UTC
assigning to 2.61
Comment 11 Malte S. Stretz 2003-08-22 13:57:20 UTC
*** Bug 2232 has been marked as a duplicate of this bug. ***
Comment 12 Theo Van Dinter 2003-09-27 11:21:30 UTC
Can you please try out 2.60 and see if you have the same issue?  The Bayes 
backend was overhauled, so you will hopefully not have any problems.

Moving to 2.70
Comment 13 Marc Ochs 2003-12-28 11:40:05 UTC
sa-learn --nonspam on a 1,100 message mailbox just OOMed my 512MB RAM 
1GB swap dual athlon.

This is with a new SpamAssassin 2.6.1. 
Comment 14 Theo Van Dinter 2004-02-15 17:54:59 UTC
ok, let's step back a second...

there are 2 issues here as far as I can see:

1) sa-learn mode in ArchiveIterator causes the list of messages to use to be 
stored in memory ...  So sa-learn first goes through all the files, finds the 
offsets for each message, and stores that in memory plus some metadata.  So if 
you're learning a huge # of messages, that can be pretty big.

2) expire/update and the various hack atime reset code were using foreach() in 
earlier versions of SA, which caused (iirc) a temp array to be created in memory 
with the full list of tokens in the DB, which can be pretty big too.


#1 still happens, but you don't seem to be getting OOMed at that stage.  #2's 
expiry/etc issue was fixed somewhere in the 2.6x series (2.62 I think), and the 
atime hackery is no longer done post 2.5x.

So I guess what I'm trying to get to is this: Does the problem still occur with 
2.63?  If you can, does it occur with 3.0.0?

If it still occurs, we'll have to do a bit more debugging.  #1 is easily 
solvable, at least on UNIX -- we already do it for mass-check runs (store the 
list of message offsets in a temp file and do some fork() fun to avoid the 
memory hit over the learning period).  It's sort of a hack, which is why it's 
not the default for sa-learn.  But I'm guessing this isn't your problem.  For 
me, the storage for ~200k message offsets only adds ~60MB to the process memory 
usage in testing/mass-check.

If #2 is occurring...  more debugging. ;)
Comment 15 Daniel Quinlan 2004-08-27 17:35:56 UTC
bugs for 3.0.1 milestone
Comment 16 Theo Van Dinter 2004-09-28 18:16:45 UTC
not hearing anything in 7 months, I'm closing the ticket as wfm.  if the problem still occurs with a 
released 3.0 version, we can reopen and investigate.