SA Bugzilla – Bug 6057
redirector_pattern doesn't explain what it does
Last modified: 2010-11-16 13:00:28 UTC
"""""""""""""""" redirector_pattern A regex pattern that matches both the redirector site portion, and the target site portion of a URI. Note: The target URI portion must be surrounded in parentheses and no other part of the pattern may create a backreference. Example: http://chkpt.zdnet.com/chkpt/whatever/spammer.domain/yo/dude redirector_pattern /^https?:\/\/(?:opt\.)?chkpt\.zdnet\.com\/chkpt\/\w+\/(.*)$/i """"""""""""""""""" 1) What happens when the pattern is matched? 2) Perhaps this example is perl code. Could this example be explained (what does it do?) 3) Why would this option be helpful? Please remember that people reading the documentation are trying to understand how to use SA and may not know perl. Thank you See 6056: +++++++++++++++++++++++++++++++ Abused by spammers http://search.yahoo.com/search?y=Search&p=flewduet%2ecom&fr=sfp&ei=UTF-8 Please add redirector_pattern for this pattern to allow effective URI BL hits. +++++++++++++++++++++++++++++++
Pls do not hijack a bug for a help/ support question which belongs in sa users list (In reply to comment #0) > """""""""""""""" > redirector_pattern > > A regex pattern that matches both the redirector site portion, and the > target site portion of a URI. > Note: The target URI portion must be surrounded in parentheses and no other > part of the pattern may create a backreference. > Example: http://chkpt.zdnet.com/chkpt/whatever/spammer.domain/yo/dude > redirector_pattern > /^https?:\/\/(?:opt\.)?chkpt\.zdnet\.com\/chkpt\/\w+\/(.*)$/i > """"""""""""""""""" > 1) What happens when the pattern is matched? > > 2) Perhaps this example is perl code. > Could this example be explained (what does it do?) > > 3) Why would this option be helpful? > > Please remember that people reading the documentation are trying to > understand how to use SA and may not know perl. > > Thank you > > See 6056: > +++++++++++++++++++++++++++++++ > Abused by spammers > http://search.yahoo.com/search?y=Search&p=flewduet%2ecom&fr=sfp&ei=UTF-8 > > Please add redirector_pattern for this pattern to allow effective URI BL hits. > +++++++++++++++++++++++++++++++ >
(in reply to comment #1) I think we can take the kinder view that reporter intended to file this as a doc bug, not "hijacking a bug" for a support question. (in reply to the bug report) However that doesn't mean I fully agree. The doc you quoted is in a section titled "Rule Definitions and Privileged Settings". Defining your own rules or modifying those settings requires a certain minimum of expertise in working with perl regexps. If you can't follow the perl regexp example, then you should not be writing your own. On the other hand, it may be a fair criticism that nowhere in the documentation does it explain why you might want to write a redirector_pattern or what uses it. I think my first cuts at writing an explanation are too wordy. Can anyone suggest a one sentence or less addition to the existing documentation that makes it clear what redirector patterns are used for?
(In reply to comment #2) > (in reply to comment #1) > > I think we can take the kinder view that reporter intended to file this as a > doc bug, not "hijacking a bug" for a support question. I apologize
Mr. Markowitz, Thank you for your kindness. I agree that the area of rule definition is in the expert area, expert in SA that is. I acknowledge that a very good understanding of perl regexp is necessary. This a section "redirector_pattern " seems far too brief and an explaination of the regexp and the results would be very helpful understanding what the rule did, what it's results were and why you would want this feature.
Ok, here is a cut at making the documentation more clear, I welcome any suggestions at better language, but if there are none in the next week or so I will update the doc anyway. redirector_pattern A regex pattern that extracts a URI obfuscated by being the target of a redirector site. The pattern must matche the entire redirector plus target. The target URI is surrounded in parentheses. There must be no other backreferences generated. Example: http://chkpt.zdnet.com/chkpt/whatever/http://spammer.domain/yo/dude is deobfuscated by redirector_pattern /^https?:\/\/(?:opt\.)?chkpt\.zdnet\.com\/chkpt\/\w+\/(.*)$/i
Mr. Markowitz, Thank you for the explanation. I knew what a redirector is but it didn't click ( I think obfuscated was the keyword) . This is a great example of a rule someone would want to write. http://chkpt.zdnet.com/chkpt/whatever/http://spammer.domain.cc/secure.com /^https?:\/\/(?:opt\.)?chkpt\.zdnet\.com\/chkpt\/\w+\/(.*)$/i string beginning with http and maybe an s then :// maybe a ( then :opt. maybe a ) then chkpt.zdnet.com/chkpt/ then one or more words followed by / a ( and all the rest of the characters to the end of the line that is the phisher's actual target URI ignoring case ((I am unclear regarding the (:opt) portion?? )) (( could you explain it? )) A help would be a reference to www.spaweditor.com/scripts/regex and regex.powertoy.org
Dennis, this bug is about the description being too vague and low on information, which should be enhanced by adding some details. It is however *not* about explaining and dissecting the given example RE. Seriously, that whole section of the docs is almost exclusively for admins. It's taken for granted that an admin who is going to write custom rules *does* understand REs already. Let alone more esoteric stuff like redirector_pattern, which frankly rarely only needs to be used by anyone. Moreover, IMHO adding references to RE tutorials is out of the scope of the entire SA documentation, more so for this specific option. Any reference if at all should be 'man perlre' anyway. ;) If you need help understanding the example, the users mailing list is the appropriate place to ask. Bugzilla is for handling and tracking bugs -- not usage questions. FWIW, your dissection of the example RE isn't correct. One hint: (?:foo) is for grouping without creating backreferences. See the note in the description about backreferences being forbidden -- cause it's used to extract the one embedded, actual target URI. [1] That all said, I'm still kind of wondering why you are poking at redirector_pattern in the first place. Any need for that, or would a simple rule do? [1] Which maybe could be added to the description, just to explain why other backreferences are not allowed. Oh, any why does FF spell-checking not know the word "backreference"? ;)
Yes, explaining even esoteric aspects of regexps is outside the scope of a doc string in the code and of a bugzilla entry. I didn't realize that the reason to avoid any other backreferences would not be clear, but maybe something can be mentioned... redirector_pattern A regex pattern that extracts a URI obfuscated by being the target of a redirector site. The pattern must match the entire redirector plus target. The target URI is identified by surrounding it in parentheses to make it the one backreference generated by the pattern: No other may be generated. URI parsing checks every URI string against every redirector_pattern that is defined. Example: http://chkpt.zdnet.com/chkpt/whatever/http://spammer.domain/yo/dude is deobfuscated by redirector_pattern /^https?:\/\/(?:opt\.)?chkpt\.zdnet\.com\/chkpt\/\w+\/(.*)$/i
Another aspect of this option that might not be clear is its actual purpose. That is, the target URI will be used for URIBL lookups. Thus I propose the following, slightly expanded description, based on Sidney's proposal. A regex pattern that extracts a URI obfuscated by being the target of a redirector site. The target URI will be used for URIBL lookups. The pattern must match the entire redirector plus target. The target URI is identified by surrounding it in parentheses to make it the one backreference generated by the pattern. No other backreference may be generated. URI parsing checks every URI string against every redirector_pattern that is defined. Anyone with better English skills than me, who can eliminate the repeated use of "generated"? ;) Also, IMHO, this rarely used option is placed too prominent in the Rule Definitions. It probably should be moved to the end of the section.
A regex pattern that extracts the target URI from a redirector URI. The target URI will be used for URIBL lookups. The pattern must match the entire redirector URI including the target. The target URI is identified by surrounding it in parentheses to make it the first capturing backreference in the regex. Only one backreference is allowed, others should be made non-capturing. URI parsing checks every URI string against every redirector_pattern that is defined.
Thanks, Per, sounds good to me. With one nit-pick. (In reply to comment #10) > the first capturing backreference in the regex. Only one backreference is > allowed, others should be made non-capturing. "others" in this context logically refers to "backreference", no? A non-capturing backreference is a contradiction in terms, because the capturing is what makes it a backreference to begin with. The last part probably should explicitly mention parentheses and grouping instead. Only one backreference is allowed, other grouping use of parentheses should be made non-capturing.