Bug 7956

Summary: config: invalid regexp for __URI_TRY_3LD
Product: Spamassassin Reporter: jnewman67 <jnewman67>
Component: RulesAssignee: SpamAssassin Developer Mailing List <dev>
Status: RESOLVED FIXED    
Severity: normal CC: apache-bugzilla, apache, apache, billcole, covex, jhardin, jnewman67, jnieuw, lameventanas, lists, matthys70, ollitech, pc, postmaster, spamassassin
Priority: P2    
Version: 3.4.6   
Target Milestone: Undefined   
Hardware: PC   
OS: Linux   
Whiteboard:
Attachments: possible fix

Description jnewman67 2022-02-18 06:40:22 UTC
running sa-update as root yields the following:

config: invalid regexp for __URI_TRY_3LD 'm,^https?://(?:try(?!r\.codeschool)|start|get(?!\.adobe)|save|check(?!out)|act|compare|join|learn(?!ing)|request|visit(?!or|\.vermont)|my(?!sub|turbotax|news\.apple|a\.godaddy|account|support|build|blob)\w)[^.]*\.[^/]+\.(?<!list-manage\.)(?:com|net)\b,i': Variable length lookbehind not implemented in regex m/(?i)^https?://(?:try(?!r\.codeschool)|start|get(?!\.adobe)|save|check(?!out)|act|compare|join|learn(?!ing)|request|vi.../
channel 'updates.spamassassin.org': lint check of update failed, channel failed

codeschool is referenced only in the updates_spamassassin_org/72_active.cf file

it appears there might be a typo in the line making the regexp format incorrect, though I'm not qualified to say specifically what it is.

I do see that the files in /var/lib/spamassassin/3.004006/updates_spamassassin_org are dated Feb 7 (2022) and that MIRRORED.BY has todays date (Feb 18) and time (01:17), so it's downloading the updates correctly.  I just don't know if it's finishing the update and using them afterwards.

Thank you.
Comment 1 jnewman67 2022-02-18 06:42:19 UTC
forgot to mention, system is CentOS 8 Stream, 4.18.0-365.el8.x86_64, fully updated as of 2/18/2022
Comment 2 Adam Pribyl 2022-02-18 07:03:02 UTC
Same on debian 10 - either there is some new perl syntax used or there is some misstake in this file.
Comment 3 Matthijs 2022-02-18 08:10:31 UTC
I can confirm as well, running Ubuntu servers and all got the same message, though I'm using:

Feb 18 09:00:06.033 [81811] dbg: channel: selected mirror http://sa-update.razx.cloud
Feb 18 09:00:06.033 [81811] dbg: http: url: http://sa-update.razx.cloud/1898145.tar.gz

Tried to get from metadata version 1898122 to 1898145
Comment 4 pasan 2022-02-18 08:16:00 UTC
Also the same. With rules 1898122 installed, today I tried to update to 1898145 and got this error.
Comment 5 Matthijs 2022-02-18 08:19:23 UTC
Problem is not related to sa-update but to the RULES
Comment 6 Adam Pribyl 2022-02-18 09:10:22 UTC
Yes, most probably it is related to rules, not sa-update itself. Could the reporter switch the component?
Comment 7 jnewman67 2022-02-18 09:42:28 UTC
changed from sa-update to RULES
Comment 8 Giovanni Bechis 2022-02-18 10:18:45 UTC
Created attachment 5762 [details]
possible fix

I cannot reproduce the issue, however the attached fix may fix the problem.
Comment 9 Alan 2022-02-18 11:45:38 UTC
@Giovanni: your patch doesn't fix it.

What should fix it is adding the /aa modifier to the regex.
Comment 10 Henrik Krohns 2022-02-18 12:12:25 UTC
(In reply to Alan from comment #9)
>
> What should fix it is adding the /aa modifier to the regex.

Note that /aa only works properly on Perl 5.16+

It already is added automatically in SpamAssassin trunk/4.0 when appropriate, so never use it in stock rules.

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=6802

John should just ditch that lookbehind in the rule, it can be as easily done with a lookahead etc.
Comment 11 Henrik Krohns 2022-02-18 12:27:00 UTC
Like described in Bug 6802, the culprit is "st". It expands similarly to "ss".

A possible workaround might also be escaping "st" to "s[t]", atleast here it stops complaining..

(?<!lis[t]-manage\.)
Comment 12 Henrik Krohns 2022-02-18 12:29:10 UTC
(In reply to Henrik Krohns from comment #11)
>
> A possible workaround might also be escaping "st" to "s[t]", atleast here it
> stops complaining..
> 
> (?<!lis[t]-manage\.)

Never mind, trying many versions, it seems to start complaining again after Perl 5.30...

So I'd still suggest just not using the lookbehind in this case..
Comment 13 pasan 2022-02-18 15:12:19 UTC
Could this work here?
lis.{0}t-manage
Comment 14 jnewman67 2022-02-18 15:46:04 UTC
more clarification:

sa-update -- version reports 

sa-update version 3.4.6 / svn1881784
  running on Perl version 5.26.3

and in /var/lib/spamassassin/3.004006/updates_spamassassin_org there's a 1898145.tar.gz file, along with it's ASC and SHA512 counterparts

so I'm assuming my system is trying to update from 1881784 to 1898145

## NOTE - I have NEVER seen this much active work to fix a bug I've reported anywhere, in this short amount of time, over the past 30 years.  Very impressed!
Comment 15 John Hardin 2022-02-18 16:02:02 UTC
(In reply to Henrik Krohns from comment #11)
> Like described in Bug 6802, the culprit is "st". It expands similarly to
> "ss".
> 
> A possible workaround might also be escaping "st" to "s[t]", atleast here it
> stops complaining..
> 
> (?<!lis[t]-manage\.)

Dammit, it passed lint here or I wouldn't have committed it. PERL 5.34

(In reply to Henrik Krohns from comment #10)

> John should just ditch that lookbehind in the rule, it can be as easily done
> with a lookahead etc.

I tried a lookahead first and it complicated the rule more than I wanted. There are lookbehind assertions elsewhere that don't cause problems, but they may not use ss/st text.

Taking a look...
Comment 16 spamassassin 2022-02-18 16:27:35 UTC
How about dropping the global /i and using local ones outside the assertion?

m,^(?i:https?)://(?i:try(?!r\.codeschool)|start|get(?!\.adobe)|save|check(?!out)|act|compare|join|learn(?!ing)|request|visit(?!or|\.vermont)|my(?!sub|turbotax|news\.apple|a\.godaddy|account|support|build|blob)\w)[^.]*\.[^/]+\.(?<![lL][iI][sS][tT]-[mM][aA][nN][aA][gG][eE]\.)(?i:com|net)\b,
Comment 17 John Hardin 2022-02-18 16:33:08 UTC
I have a fix but I won't be able to check it in until lunch.
Comment 18 pasan 2022-02-18 17:34:28 UTC
(In reply to spamassassin from comment #16)
> How about dropping the global /i and using local ones outside the assertion?
> 
> m,^(?i:https?)://(?i:try(?!r\.codeschool)|start|get(?!\.adobe)|save|check(?!
> out)|act|compare|join|learn(?!ing)|request|visit(?!or|\.vermont)|my(?!
> sub|turbotax|news\.apple|a\.godaddy|account|support|build|blob)\w)[^.]*\.[^/
> ]+\.(?<![lL][iI][sS][tT]-[mM][aA][nN][aA][gG][eE]\.)(?i:com|net)\b,

I think this could work:

(?<!lis.{0}t-manage\.)
Comment 19 John Hardin 2022-02-18 22:05:40 UTC
Commit Modified /home/jhardin/develop/spamassassin/svn/trunk/rulesrc/sandbox/jhardin/20_misc_testing.cf
Committed revision 1898196.

What may still need to be looked at is why the lookbehind passed lint in the dev and masscheck environments...
Comment 20 Jeffrey Goh 2022-02-19 03:56:48 UTC
Giovanni,

I have this happening couple update sequences now, am stuck on 1898122:

Update available for channel updates.spamassassin.org: 1898122 -> 1898145
Update available for channel updates.spamassassin.org: 1898122 -> 1898171

How can I apply the patch?  Or do I need to wait for the channel to update?
Comment 21 Jeffrey Goh 2022-02-19 04:00:21 UTC
(In reply to John Hardin from comment #19)
> Commit Modified
> /home/jhardin/develop/spamassassin/svn/trunk/rulesrc/sandbox/jhardin/
> 20_misc_testing.cf
> Committed revision 1898196.
> 
> What may still need to be looked at is why the lookbehind passed lint in the
> dev and masscheck environments...

ah ok - will wait for 1898196 to appear on the channel then.
Comment 22 Henrik Krohns 2022-02-19 08:16:05 UTC
*** Bug 7957 has been marked as a duplicate of this bug. ***
Comment 23 schicken 2022-02-19 11:46:26 UTC
It is still not going good with me (2022-02-19, 11:46 UTC):

$ sa-update -v
Update available for channel updates.spamassassin.org: 1898122 -> 1898171
http: (curl) GET http://sa-update.spamassassin.org/1898171.tar.gz, success
http: (curl) GET http://sa-update.spamassassin.org/1898171.tar.gz.sha512, success
http: (curl) GET http://sa-update.spamassassin.org/1898171.tar.gz.asc, success
config: invalid regexp for __URI_TRY_3LD 'm,^https?://(?:try(?!r\.codeschool)|start|get(?!\.adobe)|save|check(?!out)|act|compare|join|learn(?!ing)|request|visit(?!or|\.vermont)|my(?!sub|turbotax|news\.apple|a\.godaddy|account|support|build|blob)\w)[^.]*\.[^/]+\.(?<!list-manage\.)(?:com|net)\b,i': Variable length lookbehind is experimental in regex; marked by <-- HERE in m/(?i)^https?://(?:try(?!r\.codeschool)|start|get(?!\.adobe)|save|check(?!out)|act|compare|join|learn(?!ing)|request|visit(?!or|\.vermont)|my(?!sub|turbotax|news\.apple|a\.godaddy|account|support|build|blob)\w)[^.]*\.[^/]+\.(?<!list-manage\.)(?:com|net)\b <-- HERE /

I tried this, cause I'm sure is a very wise weasel:
$ rm -rf /var/lib/spamassassin/3.004006/updates_spamassassin_org*
debian@mail:~$ sudo sa-update -v
Update available for channel updates.spamassassin.org: -1 -> 1898171
http: (curl) GET http://spamassassin.apache.org/updates/MIRRORED.BY, success
http: (curl) GET http://sa-update.verein-clean.net/1898171.tar.gz, success
http: (curl) GET http://sa-update.verein-clean.net/1898171.tar.gz.sha512, success
http: (curl) GET http://sa-update.verein-clean.net/1898171.tar.gz.asc, success
config: invalid regexp for __URI_TRY_3LD 'm,^https?://(?:try(?!r\.codeschool)|start|get(?!\.adobe)|save|check(?!out)|act|compare|join|learn(?!ing)|request|visit(?!or|\.vermont)|my(?!sub|turbotax|news\.apple|a\.godaddy|account|support|build|blob)\w)[^.]*\.[^/]+\.(?<!list-manage\.)(?:com|net)\b,i': Variable length lookbehind is experimental in regex; marked by <-- HERE in m/(?i)^https?://(?:try(?!r\.codeschool)|start|get(?!\.adobe)|save|check(?!out)|act|compare|join|learn(?!ing)|request|visit(?!or|\.vermont)|my(?!sub|turbotax|news\.apple|a\.godaddy|account|support|build|blob)\w)[^.]*\.[^/]+\.(?<!list-manage\.)(?:com|net)\b <-- HERE /

channel 'updates.spamassassin.org': lint check of update failed, channel failed
Update failed, exiting with code 4

Here are details for my environment (I'm running on Debian 11, from Debian packages):

$ sa-update -V
sa-update version 3.4.6 / svn1881784
  running on Perl version 5.32.1

$ spamassassin -V
SpamAssassin version 3.4.6
  running on Perl version 5.32.1
Comment 24 Matthijs 2022-02-19 12:09:05 UTC
Can confirm as well:

# sa-update -V
sa-update version 3.4.4 / svn1869639
  running on Perl version 5.30.0
# sa-update -v
Update available for channel updates.spamassassin.org: 1898122 -> 1898171
http: (curl) GET http://sa-update-asf.snb.it/1898171.tar.gz, success
http: (curl) GET http://sa-update-asf.snb.it/1898171.tar.gz.sha512, success
http: (curl) GET http://sa-update-asf.snb.it/1898171.tar.gz.asc, success
gpg: WARNING: unsafe ownership on homedir '/etc/spamassassin/sa-update-keys'
config: invalid regexp for __URI_TRY_3LD 'm,^https?://(?:try(?!r\.codeschool)|start|get(?!\.adobe)|save|check(?!out)|act|compare|join|learn(?!ing)|request|visit(?!or|\.vermont)|my(?!sub|turbotax|news\.apple|a\.godaddy|account|support|build|blob)\w)[^.]*\.[^/]+\.(?<!list-manage\.)(?:com|net)\b,i': Variable length lookbehind is experimental in regex; marked by <-- HERE in m/(?i)^https?://(?:try(?!r\.codeschool)|start|get(?!\.adobe)|save|check(?!out)|act|compare|join|learn(?!ing)|request|visit(?!or|\.vermont)|my(?!sub|turbotax|news\.apple|a\.godaddy|account|support|build|blob)\w)[^.]*\.[^/]+\.(?<!list-manage\.)(?:com|net)\b <-- HERE /

channel 'updates.spamassassin.org': lint check of update failed, channel failed
Update failed, exiting with code 4

Why was it not tested or why do not spilt-up if to complex?
Comment 25 ollitech 2022-02-19 12:50:39 UTC
As I understand we're waiting for revision 1898196, but default channel continues to advertise 1898171 as latest...

sa-update --channel updates.spamassassin.org -v
Update available for channel updates.spamassassin.org: 1898122 -> 1898171


It seems DNS is not updated, or just slow to propogate the new version?

```
dig +short txt 4.4.3.updates.spamassassin.org
3.3.3.updates.spamassassin.org.
"1898171"
```
Comment 26 Henrik Krohns 2022-02-19 13:39:20 UTC
Chill out guys, it always takes 1-2 days for things to pass through the masscheck into sa-update.
Comment 27 John Hardin 2022-02-19 22:52:28 UTC
(In reply to Matthijs from comment #24)
> Why was it not tested or why do not spilt-up if to complex?

It *was* tested. It passed lint in my dev environment or I wouldn't have checked in the change. It passed lint at masscheck or it wouldn't have been published.

There is apparently some difference between my and masscheck's environments and the production environments where this is failing, probably related to Unicode options.

It's still not available.

jhardin@davinci ~ $ date -u ; dig +short txt 4.4.3.updates.spamassassin.org
Sat Feb 19 10:51:16 PM UTC 2022
3.3.3.updates.spamassassin.org.
"1898171"
Comment 28 Jeffrey Goh 2022-02-20 05:00:57 UTC
sa-update -v will tell you before and after version.
If you run it twice, you will see that the update was rejected because of the error, and that you are still on the old version.

Regards

(In reply to jnewman67 from comment #0)
> running sa-update as root yields the following:
> 
> config: invalid regexp for __URI_TRY_3LD
> 'm,^https?://(?:try(?!r\.codeschool)|start|get(?!\.adobe)|save|check(?!
> out)|act|compare|join|learn(?!ing)|request|visit(?!or|\.vermont)|my(?!
> sub|turbotax|news\.apple|a\.godaddy|account|support|build|blob)\w)[^.]*\.[^/
> ]+\.(?<!list-manage\.)(?:com|net)\b,i': Variable length lookbehind not
> implemented in regex
> m/(?i)^https?://(?:try(?!r\.codeschool)|start|get(?!\.adobe)|save|check(?!
> out)|act|compare|join|learn(?!ing)|request|vi.../
> channel 'updates.spamassassin.org': lint check of update failed, channel
> failed
> 
> codeschool is referenced only in the updates_spamassassin_org/72_active.cf
> file
> 
> it appears there might be a typo in the line making the regexp format
> incorrect, though I'm not qualified to say specifically what it is.
> 
> I do see that the files in
> /var/lib/spamassassin/3.004006/updates_spamassassin_org are dated Feb 7
> (2022) and that MIRRORED.BY has todays date (Feb 18) and time (01:17), so
> it's downloading the updates correctly.  I just don't know if it's finishing
> the update and using them afterwards.
> 
> Thank you.
Comment 29 Jeffrey Goh 2022-02-20 05:04:08 UTC
works for me now - thanks to whoever finally pushed the update thru, and also to all who make and keep this project awesomely useful.

# sa-update -v
Update available for channel updates.spamassassin.org: -1 -> 1898205
http: (curl) GET http://spamassassin.apache.org/updates/MIRRORED.BY, success
http: (curl) GET http://sa-update.dnswl.org/1898205.tar.gz, success
http: (curl) GET http://sa-update.dnswl.org/1898205.tar.gz.sha512, success
http: (curl) GET http://sa-update.dnswl.org/1898205.tar.gz.asc, success
Update was available, and was downloaded and installed successfully
Comment 30 jnewman67 2022-02-20 05:59:12 UTC
I'll second the thanks!

I'm still only seeing svn1881784 when I run sa-update (without errors now), but I'll check back tomorrow.
Comment 31 Matthijs 2022-02-20 06:54:12 UTC
Thanks, seems to work now ... no errors:

Feb 20 07:52:42.205 [246619] dbg: channel: metadata version = 1898205, from file /var/lib/spamassassin/3.004004/updates_spamassassin_org.cf
Feb 20 07:52:42.217 [246619] dbg: dns: 4.4.3.updates.spamassassin.org => 1898205, parsed as 1898205
Feb 20 07:52:42.217 [246619] dbg: channel: current version is 1898205, new version is 1898205, skipping channel
Comment 32 jnewman67 2022-02-20 17:44:01 UTC
looks like mine updated as well, though it's not clear:

debug says:  dbg: channel: current version is 1898205, new version is 1898205

sa-update -V shows: sa-update version 3.4.6 / svn1881784

but it says skipping channel, so I'm going to call it good, unless someone sees an issue with the output above.

Thank you to all who make this possible, and those that make this happen.
Honestly, one of the better online responses I've seen in a while, and very grateful to know there are people that still care this much about making sure the world keeps spinning in some realm of normalcy.

Thank you.
Comment 33 Bill Cole 2022-02-20 22:01:58 UTC
(In reply to jnewman67 from comment #32)
> looks like mine updated as well, though it's not clear:
> 
> debug says:  dbg: channel: current version is 1898205, new version is 1898205
> 
> sa-update -V shows: sa-update version 3.4.6 / svn1881784

Using "-V" (CAPITAL V) returns the versions of SA itself, the sa-update script, & Perl. 

Using "-v" (lowercase v) gives "verbose" output, which includes mention of the latest version according to DNS.
Comment 34 Jeffrey Goh 2022-02-21 02:41:23 UTC
(In reply to jnewman67 from comment #32)
> looks like mine updated as well, though it's not clear:
> 
> debug says:  dbg: channel: current version is 1898205, new version is 1898205
> 
> sa-update -V shows: sa-update version 3.4.6 / svn1881784
> 
> but it says skipping channel, so I'm going to call it good, unless someone
> sees an issue with the output above.
> 
> Thank you to all who make this possible, and those that make this happen.
> Honestly, one of the better online responses I've seen in a while, and very
> grateful to know there are people that still care this much about making
> sure the world keeps spinning in some realm of normalcy.
> 
> Thank you.

1898205 is >= 1898196, and the update ran successfully, so you have the patched version same as me.  As @jnewman67 pointed out "-v" is the flag you want.