SA Bugzilla – Bug 4912
RFE: Maintain a SpamAssassin corpus of messages
Last modified: 2019-07-08 10:41:10 UTC
This was a suggested idea for the Google Summer of Code 2006; I'm adding it to the bugzilla for future use, and in case anyone feels like implementing it. Subject ID: spamassassin-corpus Keywords: corpora, mail, collection, perl, community Description: Theo said: 'I'd almost rather we shift this around and make a "SpamAssassin Corpora", have all of us focus on making that the best it can be, and use that for mass-checks, etc.' This could be a good possibility. Contributors can upload their own mail corpora to a central web app where the mass-check occurs. The mail collections could be quickly checked for validity, and tagged based on how much privacy the user wants for their mails (therefore controlling further redistribution of those mails). Related to 'spamassassin-easy-mass-check' above. Possible Mentors: Justin Mason (jm at jmason.org), Theo Van Dinter (felicity-at- apache.org)
note btw that the zone nowadays includes a large corpus of messages from 3-4 contributors -- it's not public per se, but it is uploaded regularly to the zone. mass-check results are visible as "bb-foo" at http://ruleqa.spamassassin.org/ .
Closing old stale bug. I think there is no point in a centralized corpora / mass check server these days, it's also a can of worms for privacy. Masscheckers do just fine locally.