Bug 7676 - mkupdates produces different files with the same filenames causing issues with caching proxies
Summary: mkupdates produces different files with the same filenames causing issues wit...
Status: RESOLVED WORKSFORME
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Website/Infrastructure (show other bugs)
Version: unspecified
Hardware: All All
: P2 major
Target Milestone: Undefined
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-01-05 09:20 UTC by Simon Arlott
Modified: 2022-04-19 13:27 UTC (History)
4 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Simon Arlott 2019-01-05 09:20:08 UTC
https://wiki.apache.org/spamassassin/InfraNotes2017 documents when mkupdates is run:

At 02:25 it "creates ${REVISION}.tar.gz ${REVISION}.tar.gz.sha1 and ${REVISION}.tar.gz.asc in /var/www/automc.spamassassin.org/updates for mirrors to pull"
At 08:30 it "creates ${REVISION}.tar.gz ${REVISION}.tar.gz.sha1 and ${REVISION}.tar.gz.asc in /var/www/automc.spamassassin.org/updates for mirrors to pull"

Several mirrors are not serving content directly, they're behind Cloudflare's caching proxy.

User A downloads "${REVISION}.tar.gz" and "${REVISION}.tar.gz.sha1" at 03:00. They have GPG checking disabled, probably because of this issue.
They get the content of the 02:25 files and everything works.

User B downloads "${REVISION}.tar.gz", "${REVISION}.tar.gz.sha1" and "${REVISION}.tar.gz.asc" at 09:00.
Cloudflare has cached "${REVISION}.tar.gz" and "${REVISION}.tar.gz.sha1". They get the content of 02:25 files.
No one has requested "${REVISION}.tar.gz.asc" so Cloudflare does not have it cashed. They get the content of the 08:30 files.

The content is different between the two runs of mkupdates but has the same filenames because the revision hasn't changed. I don't know why that is necessary or desirable but it does not interact well with caching proxies.

The workaround is to block access to Cloudflare, but then bug 7662 happens.
Comment 1 Henrik Krohns 2022-04-19 06:34:33 UTC
Actually this is not true.

2:25 mkupdate-with-scores creates and publishes the tar.gz

8:30 run_nightly does NOT publish it again if mkupdate already did it, as we can see from this code:

  # Integrate with masscheck ruleset updates to prevent duplicates
  RECENT=`find $HOME/tmp/mkupdate-with-scores -name \*.tar.gz -mmin -480`
  if [[ -z "$RECENT" ]]; then
    echo "Recent ruleset from mkupdate-with-scores (massheck) NOT found."
    echo "Proceeding with a ruleset publish..."
    ....
  else
    echo "Recent ruleset from mkupdate-with-scores (massheck) found:"
    ls -l $RECENT
    echo ""
  fi

The question is if the run_nightly code should ever do it (unnecessary duplicate code etc), but that's out of the scope of this bug. Closing.
Comment 2 Simon Arlott 2022-04-19 07:09:14 UTC
It looks like this was changed after I first raised it:

------------------------------------------------------------------------
r1828938 | davej | 2018-04-11 22:44:28 +0100 (Wed, 11 Apr 2018) | 1 line

Add check to the rules promotion to prevent duplicate rulesets 6 hours apart with the same name.
------------------------------------------------------------------------
r1829141 | davej | 2018-04-14 15:07:01 +0100 (Sat, 14 Apr 2018) | 1 line

Had logic backwards for recent ruleset test from masscheck processing.
------------------------------------------------------------------------

The documentation on https://cwiki.apache.org/confluence/display/SPAMASSASSIN/InfraNotes2020 still claims that it gets regenerated twice.
Comment 3 Sidney Markowitz 2022-04-19 13:26:23 UTC
I'll fix this on the wiki, also I'll change the status of this issue from workforme to fixed in r1828938, since it actually was
Comment 4 Sidney Markowitz 2022-04-19 13:27:39 UTC
And back to worksforme, since this bug was opened after the fix was done, as I now see after looking at the dates.