Bug 8068 - t/spamd_ssl_accept_fail.t failure on slow systems
Summary: t/spamd_ssl_accept_fail.t failure on slow systems
Status: REOPENED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Regression Tests (show other bugs)
Version: 4.0.0
Hardware: PC Linux
: P2 minor
Target Milestone: 4.0.0
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-10-24 22:13 UTC by Sidney Markowitz
Modified: 2023-10-20 17:32 UTC (History)
3 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Sidney Markowitz 2022-10-24 22:13:46 UTC
t/spamd_ssl_accept_failt.t was written to test the fix for bug 4107 which was spamd crashing when it was launched with the option to accept only ssl connections and spamc called it using non-ssl protocol.

The test consists of launching spamd with ssl option, calling it once with spamc without ssl, then again with spamc using ssl, then stopping spamd, confirming that the combined return from the two calls to spamc contains the expected spam report from the second call.

In setting up testing on GitHub Actions, the test fails with the second spamc call not appearing to contact spmd, same as the non-ssl call.

If I add a sleep(1) between the first and second call of spamc, the test works.

I suspect that the fix for bug 4107 does not prevent the child process from crashing when it receives the non-ssl call from spamc. That is good enough as far as the fix goes, because the child gets re-spawned.

If I'm right about the cause of the test failure, it is only a problem if spamd is running using ssl on a slower than useful for production system and spamc calls it without ssl from a machine that has access to the spamd port and then a proper ssl call from spamc happens less than a second later, and then the only result is that the second spamc call fails. If that is the only failure scenario I think it would be fine to just add a sleep(1) to the test so that it can pass on slow systems like the GitHub action runner.

If anyone wants to look at the code and and make a more robust fix for bug 4107 that keeps the child from crashing, feel free.

Since my proposed change is only in the test, I think I can commit it for 4.0.0 without a vote.
Comment 1 Sidney Markowitz 2022-10-24 22:21:55 UTC
trunk % svn ci -m "bug 8068 - Add a delay in the test to allow for some slow test systems" t/spamd_ssl_accept_fail.t 
Sending        t/spamd_ssl_accept_fail.t
Transmitting file data .done
Committing transaction...
Committed revision 1904818.
Comment 2 Alyssa Ross 2023-10-20 08:51:41 UTC
In Nixpkgs we are still seeing intermittent failures of this test on systems under heavy load with SpamAssassin 4.0.0.

e.g. https://hydra.nixos.org/build/238455345/nixlog/2
Comment 3 Bill Cole 2023-10-20 17:32:11 UTC
(In reply to Alyssa Ross from comment #2)
> In Nixpkgs we are still seeing intermittent failures of this test on systems
> under heavy load with SpamAssassin 4.0.0.
> 
> e.g. https://hydra.nixos.org/build/238455345/nixlog/2

The fix for this test failure (see revision 1904818) was simply adding a 1 second 'sleep' delay. I don't think it would be terrible to increase that delay a little, but it would be good to have an idea of how long it needs to be.

Are you able to reproduce this reliably enough that you could test different delays to see how long it needs to be? I'd see no issue with bumping it to 5 or maybe even 10 seconds, but I don't have any way to reproduce the failure mode of a heavily-loaded build system.