SA Bugzilla – Bug 4157
Reducing System Load with Temporary Rejections - Penalty Box
Last modified: 2005-02-25 10:25:10 UTC
I've discovered a trick that has significantly reduced the system load using Spam Assassin and I'm thinking that the idea should be incorporated into SA and done better than I am doing it. Often a spammer is sending the same spam over and over to different people and SA correctly identifies the same spam - but at a cost of load on the system. Sometimes spammers pound the server over and over with dictionary and various other attacks. This suggestion is geared to reducing the load on the system by slowing down the sammers to temporary errors using what I call - a penalty box. The idea is that once a from address has sent a spam any email from that address will get a temport error (come back later) from the MTA for the next 5 minutes. If the sender is sending ham - the message will eventually get through. But in mant cases spammer make only one attemtp and move on. I'm using Exim and most of what I'm doing is at the MTA level. Basically spammers are put into a temporary black list that is used to retern temporary errors. Sometimes I put the IP address in a similare list to return temp arrors. Every 5 minutes the list is emptied from a cron job. And - it is working very well in reducing the load of having to process the same spam over and over, as well as reducing the load of other "sins" that spammer commit. So - how does this tie into Spam Assassin? It would be handy if SA could maintain a short lived database (DB file? Text File?) that contained a list of recient spammers or spam information in a way that can be read form Exim or other MTAs - or SA itself - for the purpose of reducing system load from spammers that hammer the server over and over in a short period of time. It's sort of a recient sinners list and can contain either from addresses or IP adresses of offenders. This is similar in many ways to greylisting but with greylisting you penalize everyone new with delays. This method only penalizes by delays those who have previously offended. It isn't as effective as greylisting in some ways - but it eliminates the delays greylisting causes on new ham that I consider to be unacceptable. The penalty box idea is working very well for me and it gets rid of some nasty load spikes that used to hit pretty hard. I think it's worth considering ways to reduce load by reducing the number of messages SA has to process.
My first reaction was that this would be a great idea for my ISP to use, and I started to write a letter to a sysadmin there to suggest it. While writing it I realized that it would put the system behavior at the mercy of individual users' local configuration options. At a minimum, you should not have a system-wide temporary fail of email because of a hit on somebody's personal blacklist. Similarly, people could have local scores set high for some specific rules. I know of people who are so against HTML in email that they have set scores of some SA rules that only hit when there is HTML to 1000. That may not make a lot of sense from a pure spam-filtering point of view, but they do it. How could this idea work in an ISP environment where users have the ability to set their local options?
Perhaps a small modification to SpamAssassin to create two scores. One would be that produced by the system wide rules and the other would include the users rules and modifications. This would still be one pass, just keep two running totals as you go through the rules. You would need a new header to display the score for the system wide rules, and would not normally need to display the details of how this score was arived at.
Subject: Re: Reducing System Load with Temporary Rejections - Penalty Box bugzilla-daemon@bugzilla.spamassassin.org wrote: >http://bugzilla.spamassassin.org/show_bug.cgi?id=4157 > > The idea isn't to bounce messages system wide. It's only to delays senders for a 5 minute period. And - it would be triggered by high scores.
I understand that the effect is not as drastic as a bounce, and I agree that a temp fail penalty box can be a great idea -- especially since spam senders usually do not retry temp fails. The problem is that you can't count on a high score really meaning that some piece of mail is spam when individual users are able to set their own preferences. They can blacklist anyone they choose, they can assign high scores to rules that would FP for everyone else, they can even rescore everything and set their spam threshold to 100 if they feel like it. The only way this could work without making it vulnerable to individual user preferences is to have some mechanism to keep track of rule hits and scores using the default scoring. To do that, you would have to deal with the problem that assigning a rule a score of 0 now means that the rule is never run. That's a problem with Tom's suggestion of keeping two running totals. I'm not saying that the idea is not feasible, but I'm raising issues that I think an implementation would need to address if it is going to be practical. I agree that the goal of finding a way to temporarily temp fail likely spam senders is a good one.
This is not SA's job.
Subject: Re: Reducing System Load with Temporary Rejections - Penalty Box > The problem is that you can't count on a high > score really meaning that some piece of mail is spam when individual users are > able to set their own preferences. So implement it first only for sites that don't allow individual user rules or scores. That way the site rules prevail, because they are the only ones around. At a guess, that might be 60% of the SA sites. After that, yes, some additional trickery would be needed to remove the effect of user rules.
Subject: Re: Reducing System Load with Temporary Rejections - Penalty Box I don't think that additional trickery is necessary because it's only a 5 minute temp error so if something gets it wrong it's only a 5 minute problem and no real email is lost. The beauty of this system is that you don't have to be precise. If you get it wrong - no big deal. The whole point of this is to reduce system load without sacrificing any email and the short delays on false positives are not significant and far less of a delay than greylisting. This is about keeping it simple.