SA Bugzilla – Bug 6973
google translate redirector_pattern is incomplete
Last modified: 2023-06-29 23:43:41 UTC
Google Translate redirector pattern doesn't cover all of the possible URL's supported by Google's translation API. Valid URL's include https?://translate.google.com/translate_[ct]/, but the redirector_pattern provided with SpamAssassin only matches https://translate.google.com/translate/ A more complete pattern is: redirector_pattern m'^http:/*(?:\w+\.)?google(?:\.\w{2,3}){1,2}/translate(_[ct])?\?.*?(?<=[?&])u=(.*?)(?:$|[&\#])'i
(In reply to Chris Myers from comment #0) > redirector_pattern > m'^http:/*(?:\w+\.)?google(?:\.\w{2,3}){1,2}/translate(_[ct])?\?. > *?(?<=[?&])u=(.*?)(?:$|[&\#])'i ITYM: m'^https?:/* -----------^^
The redirector_pattern in the report began life as a cut-and-paste from my updates_spamassassin_org/72_active.cf file. It really says just http:// rather than https?:// (which I agree is an improvement). My change to the pattern is actually changing .../translate\? to /translate(_[ct])?.
errr actually I meant to say "/translate(_[ct])\?" with the backslash. :-(
(In reply to Chris Myers from comment #2) > The redirector_pattern in the report began life as a cut-and-paste from my > updates_spamassassin_org/72_active.cf file. It really says just http:// > rather than https?:// (which I agree is an improvement). Indeed? I didn't actually check the current sources - if so, that's a hole. > My change to the > pattern is actually changing .../translate\? to /translate(_[ct])?. ...or /translate(?:_[ct])?\? :) Can you provide a pointer to a spec from Google that documents the possible formats? Or was this just from observation?
> Indeed? I didn't actually check the current sources - if so, that's a hole. Yup. Agreed that getting rid of the unneeded backreference is probably a beneficial thing. I don't live-and-breath Perl RE's. I've seen /translate_c and /translate_t referred to by users on the Internet (such as http://googlesystem.blogspot.com/2008/03/useful-google-translate-addresses.html) but didn't find any actual Google doc -- it may be an internal thing rather than part of the public API. This particular bug report is driven by an actual spam message that referenced a URL beginning with: http://translate.google.co.ke/tran%73%6C%61te_c?hl=<omitted>