Bug 6057 - redirector_pattern doesn't explain what it does
Summary: redirector_pattern doesn't explain what it does
Status: NEW
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Documentation (show other bugs)
Version: unspecified
Hardware: Other All
: P5 normal
Target Milestone: Undefined
Assignee: SpamAssassin Developer Mailing List
URL: http://spamassassin.apache.org/full/3...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-02-04 06:27 UTC by Dennis German
Modified: 2010-11-16 13:00 UTC (History)
2 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Dennis German 2009-02-04 06:27:22 UTC
""""""""""""""""
redirector_pattern

    A regex pattern that matches both the redirector site portion, and the target site portion of a URI. 
    Note: The target URI portion must be surrounded in parentheses and no other part of the pattern may create a backreference.
    Example: http://chkpt.zdnet.com/chkpt/whatever/spammer.domain/yo/dude
      redirector_pattern    /^https?:\/\/(?:opt\.)?chkpt\.zdnet\.com\/chkpt\/\w+\/(.*)$/i
"""""""""""""""""""
1) What happens when the pattern is matched?

2) Perhaps this example is perl code.
    Could this example be explained (what does it do?)

3) Why would this option be helpful?

Please remember that people reading the documentation are trying to
understand how to use SA and may not know perl.

Thank you

See 6056: 
+++++++++++++++++++++++++++++++
Abused by spammers
http://search.yahoo.com/search?y=Search&p=flewduet%2ecom&fr=sfp&ei=UTF-8

Please add redirector_pattern for this pattern to allow effective URI BL hits.
+++++++++++++++++++++++++++++++
Comment 1 AXB 2009-02-04 06:31:09 UTC
Pls do not hijack a bug for a help/ support question which belongs in sa users list

(In reply to comment #0)
> """"""""""""""""
> redirector_pattern
> 
>     A regex pattern that matches both the redirector site portion, and the
> target site portion of a URI. 
>     Note: The target URI portion must be surrounded in parentheses and no other
> part of the pattern may create a backreference.
>     Example: http://chkpt.zdnet.com/chkpt/whatever/spammer.domain/yo/dude
>       redirector_pattern   
> /^https?:\/\/(?:opt\.)?chkpt\.zdnet\.com\/chkpt\/\w+\/(.*)$/i
> """""""""""""""""""
> 1) What happens when the pattern is matched?
> 
> 2) Perhaps this example is perl code.
>     Could this example be explained (what does it do?)
> 
> 3) Why would this option be helpful?
> 
> Please remember that people reading the documentation are trying to
> understand how to use SA and may not know perl.
> 
> Thank you
> 
> See 6056: 
> +++++++++++++++++++++++++++++++
> Abused by spammers
> http://search.yahoo.com/search?y=Search&p=flewduet%2ecom&fr=sfp&ei=UTF-8
> 
> Please add redirector_pattern for this pattern to allow effective URI BL hits.
> +++++++++++++++++++++++++++++++
> 

Comment 2 Sidney Markowitz 2009-02-04 07:11:43 UTC
(in reply to comment #1)

I think we can take the kinder view that reporter intended to file this as a doc bug, not "hijacking a bug" for a support question.

(in reply to the bug report)

However that doesn't mean I fully agree. The doc you quoted is in a section titled "Rule Definitions and Privileged Settings". Defining your own rules or modifying those settings requires a certain minimum of expertise in working with perl regexps. If you can't follow the perl regexp example, then you should not be writing your own.

On the other hand, it may be a fair criticism that nowhere in the documentation does it explain why you might want to write a redirector_pattern or what uses it. I think my first cuts at writing an explanation are too wordy. Can anyone suggest a one sentence or less addition to the existing documentation that makes it clear what redirector patterns are used for?
Comment 3 AXB 2009-02-04 07:16:14 UTC
(In reply to comment #2)
> (in reply to comment #1)
> 
> I think we can take the kinder view that reporter intended to file this as a
> doc bug, not "hijacking a bug" for a support question.

I apologize
Comment 4 Dennis German 2009-02-04 11:47:13 UTC
Mr. Markowitz, Thank you for your kindness. I agree that the area of rule definition is in the expert area, expert in SA that is. I acknowledge that a very good understanding of perl regexp is necessary. This a section "redirector_pattern
" seems far too brief and an explaination of the regexp and the results would be very helpful understanding what the rule did, what it's results were and why you would want this feature.
Comment 5 Sidney Markowitz 2009-02-04 12:48:12 UTC
Ok, here is a cut at making the documentation more clear, I welcome any suggestions at better language, but if there are none in the next week or so I will update the doc anyway.

redirector_pattern

     A regex pattern that extracts a URI obfuscated by being the target of a redirector site. The pattern must matche the entire redirector plus target. The target URI is surrounded in parentheses. There must be no other backreferences generated.

    Example: http://chkpt.zdnet.com/chkpt/whatever/http://spammer.domain/yo/dude is deobfuscated by
      redirector_pattern   
/^https?:\/\/(?:opt\.)?chkpt\.zdnet\.com\/chkpt\/\w+\/(.*)$/i
Comment 6 Dennis German 2009-02-05 06:49:13 UTC
Mr. Markowitz, Thank you for the explanation. I knew what a redirector is but it didn't click ( I think obfuscated was the keyword) . This is a great example of a rule someone would want to write.

http://chkpt.zdnet.com/chkpt/whatever/http://spammer.domain.cc/secure.com 
/^https?:\/\/(?:opt\.)?chkpt\.zdnet\.com\/chkpt\/\w+\/(.*)$/i

string beginning with http and maybe an s  then :// 
maybe a ( then :opt.  maybe a ) then   chkpt.zdnet.com/chkpt/
then one or more words followed by /  
 a ( and all the rest of the characters to the end of the line
   that is the phisher's actual target URI
ignoring case

((I am unclear regarding the   (:opt) portion?? ))
(( could you explain it?   ))

A help would be a reference to  www.spaweditor.com/scripts/regex
and  regex.powertoy.org  

Comment 7 Karsten Bräckelmann 2009-02-05 07:56:56 UTC
Dennis, this bug is about the description being too vague and low on information, which should be enhanced by adding some details. It is however *not* about explaining and dissecting the given example RE.

Seriously, that whole section of the docs is almost exclusively for admins. It's taken for granted that an admin who is going to write custom rules *does* understand REs already. Let alone more esoteric stuff like redirector_pattern, which frankly rarely only needs to be used by anyone.

Moreover, IMHO adding references to RE tutorials is out of the scope of the entire SA documentation, more so for this specific option. Any reference if at all should be 'man perlre' anyway. ;)


If you need help understanding the example, the users mailing list is the appropriate place to ask. Bugzilla is for handling and tracking bugs -- not usage questions.

FWIW, your dissection of the example RE isn't correct. One hint: (?:foo) is for grouping without creating backreferences. See the note in the description about backreferences being forbidden -- cause it's used to extract the one embedded, actual target URI. [1]

That all said, I'm still kind of wondering why you are poking at redirector_pattern in the first place. Any need for that, or would a simple rule do?


[1] Which maybe could be added to the description, just to explain why other
    backreferences are not allowed.


Oh, any why does FF spell-checking not know the word "backreference"? ;)
Comment 8 Sidney Markowitz 2009-02-05 11:41:25 UTC
Yes, explaining even esoteric aspects of regexps is outside the scope of a doc string in the code and of a bugzilla entry. I didn't realize that the reason to avoid any other backreferences would not be clear, but maybe something can be mentioned...


redirector_pattern

     A regex pattern that extracts a URI obfuscated by being the target of a
redirector site. The pattern must match the entire redirector plus target. The
target URI is identified by surrounding it in parentheses to make it the one backreference generated by the pattern: No other may be generated. URI parsing checks every URI string against every redirector_pattern that is defined.

    Example:
http://chkpt.zdnet.com/chkpt/whatever/http://spammer.domain/yo/dude is
deobfuscated by
      redirector_pattern   
/^https?:\/\/(?:opt\.)?chkpt\.zdnet\.com\/chkpt\/\w+\/(.*)$/i
Comment 9 Karsten Bräckelmann 2009-02-06 05:15:20 UTC
Another aspect of this option that might not be clear is its actual purpose. That is, the target URI will be used for URIBL lookups. Thus I propose the following, slightly expanded description, based on Sidney's proposal.

  A regex pattern that extracts a URI obfuscated by being the target of a
  redirector site.  The target URI will be used for URIBL lookups.

  The pattern must match the entire redirector plus target.  The target URI
  is identified by surrounding it in parentheses to make it the one
  backreference generated by the pattern.  No other backreference may be
  generated.  URI parsing checks every URI string against every
  redirector_pattern that is defined.


Anyone with better English skills than me, who can eliminate the repeated use of "generated"? ;)

Also, IMHO, this rarely used option is placed too prominent in the Rule Definitions. It probably should be moved to the end of the section.
Comment 10 Per Jessen 2010-11-16 09:54:59 UTC
  A regex pattern that extracts the target URI from a redirector URI.  
  The target URI will be used for URIBL lookups.

  The pattern must match the entire redirector URI including the target.
  The target URI is identified by surrounding it in parentheses to make it
  the first capturing backreference in the regex. Only one backreference is
  allowed, others should be made non-capturing.
  URI parsing checks every URI string against every redirector_pattern that 
  is defined.
Comment 11 Karsten Bräckelmann 2010-11-16 13:00:28 UTC
Thanks, Per, sounds good to me. With one nit-pick.

(In reply to comment #10)
>   the first capturing backreference in the regex. Only one backreference is
>   allowed, others should be made non-capturing.

"others" in this context logically refers to "backreference", no? A non-capturing backreference is a contradiction in terms, because the capturing is what makes it a backreference to begin with. The last part probably should explicitly mention parentheses and grouping instead.

  Only one backreference is allowed, other grouping use of parentheses
  should be made non-capturing.