Bug 734 - PYZOR check fails under spamd
Summary: PYZOR check fails under spamd
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Rules (Eval Tests) (show other bugs)
Version: 2.40CVS
Hardware: Other other
: P2 major
Target Milestone: ---
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2002-08-26 05:19 UTC by Michael Moncur
Modified: 2002-08-26 21:49 UTC (History)
0 users



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Moncur 2002-08-26 05:19:01 UTC
If PYZOR_CHECK is enabled, messages over somewhere near 5K cause spamd to stop 
processing the message before finishing. The message is sent back to spamc with 
no SpamAssassin headers. This has caused a bunch of 'false positives' that 
appeared untouched by SA.

Details below. If someone else can duplicate this, I recommend disabling Pyzor 
checking until this is figured out. I guess it could also be something specific 
to my system.

The following is the tail end of spamd's debug output when this happens. This 
particular test message is 9156 bytes long.
...
debug: DCC is available: 
debug: DCC: got response: X-DCC-sackHeads-Metrics: host.yrex.com 1012; Body=7 
Fuz1=7 Fuz2=7

debug: Pyzor is available: 
debug: Pyzor -> check failed - Broken pipe.
debug: forged_rcvd_trail: entry 0: by=(undef) from=(undef) mismatches=0
debug: forged_rcvd_trail: entry 1: by=starlingtech.com from=gmtiresponder.com 
mismatches=0
debug: DNS MX records found: 4
-----(it ends here. it shouldn't.)

When the same message is run through spamc with the PYZOR_CHECK score set to 
zero, spamd successfully marks the message as spam:
....
debug: DCC is available: 
debug: DCC: got response: X-DCC-sackHeads-Metrics: host.yrex.com 1012; Body=8 
Fuz1=8 Fuz2=8

debug: forged_rcvd_trail: entry 0: by=(undef) from=(undef) mismatches=0
debug: forged_rcvd_trail: entry 1: by=starlingtech.com from=gmtiresponder.com 
mismatches=0
debug: DNS MX records found: 4
debug: running meta tests; score so far=13.2
debug: AWL active, pre-score: 14.9, mean: undef, originating-ip: 216.10.23.150
debug: Post AWL score: 14.9
debug: is spam? score=14.9 required=7 
tests=COMPLETELY_FREE,CTYPE_JUST_HTML,HTML_50_70,HTML_FONT_COLOR_NAME,HTML_FONT_
COLOR_RED,HTML_FONT_COLOR_YELLOW,HTML_FONT_FACE_ODD,HTML_WITH_BGCOLOR,JAVASCRIPT
,MAILTO_LINK,NO_REAL_NAME,SPAM_PHRASE_00_01,SUBJECT_IS_NEWS,TABLE_THICK_BORDER
logmsg: identified spam (14.9/7.0) for root:99 in  28 seconds, 9156 bytes.
-----(it ends here, correctly.)

I did a bit more testing and the following may help narrow it down:
- Editing the same message to be below about 5K in size makes spamd work fine 
even with Pyzor enabled. I haven't determined the exact size where this starts 
to happen.
- This only happens with spamc/spamd. `spamassassin' works fine.
- It's not a specific message, I've tried several.
Comment 1 Justin Mason 2002-08-26 08:31:17 UTC
bugger, this one is serious :(   we should really try to nobble
it before 2.40 release.

I think it's probably something to use with forking a subprocess
(ie pyzor-check), and perl's IO buffering. note also that Mike is
using DCC, too, which also forks and exec's dcc's checker tool.

Mike -- what perl version and OS are you using BTW, in case that's
relevant?
Comment 2 Michael Moncur 2002-08-26 14:55:03 UTC
I'm using Perl 5.6.1 under Red Hat Linux 7.1.
Comment 3 Justin Mason 2002-08-26 15:18:19 UTC
excellent!  I think this is fixed now, in b2_4_0.  It was indeed a buffering
issue caused by the use of open2(), so I fixed it for dcc as well, just in
case.  Mike, could you check?
Comment 4 Michael Moncur 2002-08-26 20:19:30 UTC
Hmmm, I'm still getting "Broken pipe" on Pyzor but the spam is scored correctly 
now. I'm going to run some tests now to see if Pyzor is working on my system.

debug: Pyzor is available: 
debug: Pyzor -> check failed - Broken pipe.
debug: forged_rcvd_trail: entry 0: by=(undef) from=(undef) mismatches=0
debug: forged_rcvd_trail: entry 1: by=starlingtech.com from=gmtiresponder.com 
mismatches=0
debug: DNS MX records found: 4
debug: running meta tests; score so far=13.2
debug: AWL active, pre-score: 14.9, mean: undef, originating-ip: 216.10.23.150
debug: Post AWL score: 14.9
debug: is spam? score=14.9 required=7 
tests=COMPLETELY_FREE,CTYPE_JUST_HTML,HTML_50_70,HTML_FONT_COLOR_NAME,HTML_FONT_
COLOR_RED,HTML_FONT_COLOR_YELLOW,HTML_FONT_FACE_ODD,HTML_WITH_BGCOLOR,JAVASCRIPT
,MAILTO_LINK,NO_REAL_NAME,SPAM_PHRASE_00_01,SUBJECT_IS_NEWS,TABLE_THICK_BORDER
logmsg: identified spam (14.9/7.0) for root:99 in   2 seconds, 9156 bytes.
Comment 5 Michael Moncur 2002-08-26 20:35:53 UTC
Did some more checking. I'm still getting "broken pipe" on messages > ~5K. On 
smaller messages, I get a different error in the debug output:


debug: Pyzor is available: 
debug: Pyzor: got response: environment variable HOME is unset; please set it
debug: Pyzor: couldn't grok response "environment variable HOME is unset; 
please set it
"

In either case, I never get a result for PYZOR_CHECK, even on messages that are 
definitely listed. HOME is definitely set correctly when I start spamd. The 
Pyzor check works fine with spamassassin -t.

Just to clarify, the original problem (spamd failing altogether on the Pyzor 
check) seems to be fixed, but Pyzor checking under spamc/spamd still seems 
broken.
Comment 6 Malte S. Stretz 2002-08-27 02:37:43 UTC
I think we change the $HOME somewhere in the Razor code to be a directory in 
/tmp. Maybe there's a problem... 
Comment 7 Justin Mason 2002-08-27 04:36:37 UTC
Yes, looks like it's because $HOME is reset when spamd is started,
and Pyzor relies on it.  Can anyone think *why* we do this??

What we can do is copy $ENV{'HOME'} when spamd starts, then set
$ENV{'HOME'} in the Pyzor check to that value for the duration
of the check.

I've just implemented this for Razor, Pyzor et al, and it seems
to fix it.

Comment 8 Malte S. Stretz 2002-08-27 04:45:23 UTC
I think it was done because else razor put its razor.lst into / when SA is 
called as root or without $HOME. Don't ask me for details. There might be a 
bug about this somewhere here in zilla... 
Comment 9 Michael Moncur 2002-08-27 05:20:13 UTC
I can confirm that this is fixed in b2_4_0 - spamd now successfully scores 
Pyzor on messages both < 5K and > 5K. Nice work!
Comment 10 Justin Mason 2002-08-27 05:49:59 UTC
excellent! closing bug.