|
SA Bugzilla – Full Text Bug Listing |
Summary: | spamc --randomize/-H switch broken | ||
---|---|---|---|
Product: | Spamassassin | Reporter: | Kelsey Cummings <kgc> |
Component: | spamc/spamd | Assignee: | SpamAssassin Developer Mailing List <dev> |
Status: | NEW --- | ||
Severity: | normal | CC: | apache, jquinn+SAbug, kgc, kmcgrail, rwmaillists, williamt |
Priority: | P2 | ||
Version: | 3.4.0 | ||
Target Milestone: | Future | ||
Hardware: | PC | ||
OS: | Linux | ||
Whiteboard: |
Description
Kelsey Cummings
2014-05-31 00:45:29 UTC
Kelsey, please describe the issue in detail and add information on how to reproduce. * In what way does --randomize not work? What is the behavior you observe, and what do you expect instead? * Are there any other spamc options used? * Is DNS involved? What does it resolve to, single or multiple IP addresses? Resetting version to unspecified. The originally reported version of 3.4.2 does not exist. Please update with the correct version. --randomize appears to have no affect ;) This is easily reproducible with any DNS record that returns multiple A records or, presumably, multiple servers passed in with -d # strace -econnect spamc -d spam-cluster -H < test.msg 2>&1 | grep 783 Will display the hosts tried in a order consistent with how they are returned by the NS request - this is particularly problematic for users where their name server does not shuffle multiple records (powerdns.) eg: connect(3, {sa_family=AF_INET, sin_port=htons(783), sin_addr=inet_addr("1.1.221.234")}, 16) = 0 connect(3, {sa_family=AF_INET, sin_port=htons(783), sin_addr=inet_addr("1.1.208.75")}, 16) = 0 connect(3, {sa_family=AF_INET, sin_port=htons(783), sin_addr=inet_addr("1.1.208.73")}, 16) = 0 connect(3, {sa_family=AF_INET, sin_port=htons(783), sin_addr=inet_addr("1.1.221.236")}, 16) = 0 connect(3, {sa_family=AF_INET, sin_port=htons(783), sin_addr=inet_addr("1.1.208.76")}, 16) = 0 connect(3, {sa_family=AF_INET, sin_port=htons(783), sin_addr=inet_addr("1.1.221.233")}, 16) = 0 connect(3, {sa_family=AF_INET, sin_port=htons(783), sin_addr=inet_addr("1.1.221.235")}, 16) = 0 connect(3, {sa_family=AF_INET, sin_port=htons(783), sin_addr=inet_addr("1.1.208.74")}, 16) = 0 I was just looking at the code and it appear that it doesn't attempt to randomize the order, it just rotates the list a random number of time. This may be a bit confusing if you are expecting proper randomization. Irrespective of whether it's working as intended, this is a poor way of doing load balancing. For example if there are four ip addresses A,B,C,D and A and B are offline, C gets three times as much load as D. (In reply to RW from comment #3) > I was just looking at the code and it appear that it doesn't attempt to > randomize the order, it just rotates the list a random number of time. This > may be a bit confusing if you are expecting proper randomization. > > Irrespective of whether it's working as intended, this is a poor way of > doing load balancing. For example if there are four ip addresses A,B,C,D and > A and B are offline, C gets three times as much load as D. I believe you are misunderstanding what the switch does. -H, --randomize For TCP/IP sockets, randomize the IP addresses returned for the hosts given by the -d switch. This provides for a simple kind of load balancing. It will try only three times though. if you do -d host1,host2,host3... it randomizes those hosts. It does not randomize the A records if you have a host1 that gives 3 A records, for example. Are you proposing that you would like it to randomize all entries at a weight of 1 whether explicit or from an A/AAAA record look up? Regards, KAM Regardless, the 3.3.2 spamc client, used with identical arguments, distributes load evenly amongst the multiple servers returned by a host record with multiple IP addresses. The 3.4.0 client no longer exhibits this desirable behavior. What Kelsey is saying is true. Ever since we upgraded to 3.4.0 the -H flag to spamc has no effect when multiple ip addresses are returned. It always keeps the first one it gets. This appears to be the code that broke -H. The for loop is the old behavior. If you comment out the tp->hosts line and un-comment the for loop you will get the old behavior. /* treat all A or AAAA records of each host as one entry */ tp->hosts[tp->nhosts++] = res; /* alternatively, treat multiple A or AAAA records of one host as individual entries */ /* for (addrp = res; addrp != NULL; ) { tp->hosts[tp->nhosts] = addrp; addrp = addrp->ai_next; /* before NULLing ai_next */ tp->hosts[tp->nhosts]->ai_next = NULL; tp->nhosts++; } */ Thanks. Likely someone adding quad A record support and not realizing the rest of the loop impact is my off the cuff guess. Overall, I believed Kelsey and you've helped that I don't have to find the code difference. But I was on the same path because it's part of this svn change: http://mail-archives.apache.org/mod_mbox/spamassassin-commits/201210.mbox/%3C20121002001948.25E0523888E4@eris.apache.org%3E So apparently he matched the -H to the documentation. Mark, do you remember why you did this? Do we need to add yet another option for the more truly randomization? regards, KAM The old behaviour wouldn't have worked optimally with multiple host entries that are multi-homed, since it tries 3 consecutive IP addresses rather than 3 consecutive hosts. IMO it would be better to leave that code as it is and fix the randomization function so it moves a random 3 host entries to the beginning of the array and then randomizes the address list for those entries. Note, you can see from the above strace output above that all the IP addresses for a host are tried at present. It would be better to avoid random rotations altogether. Pushing to 3.4.3. No patch to fix and not a true blocker Postponing into future |