Bug 796 - add hashcash support
Summary: add hashcash support
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: spamassassin (show other bugs)
Version: SVN Trunk (Latest Devel Version)
Hardware: Other other
: P4 enhancement
Target Milestone: 2.70
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on: 1041
Blocks:
  Show dependency tree
 
Reported: 2002-08-30 20:09 UTC by Daniel Quinlan
Modified: 2003-12-14 15:23 UTC (History)
1 user (show)



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Daniel Quinlan 2002-08-30 20:09:43 UTC
Tracking ticket for adding support for hashcash to SpamAssassin.

For more information:

  http://www.camram.org/
  www.cypherspace.org/~adam/hashcash/

I'm not sure, but we might want to avoid using external software if possible
(aside from Digest::SHA1).  It also looks like rule/score ranges are desireable
since cash value varies (age, number of recipients, length of bit-sequence, etc.)
Comment 1 Daniel Quinlan 2002-08-30 23:35:35 UTC
Tony L. Svanstrom is working on this ticket.
Comment 2 Justin Mason 2002-09-04 06:29:49 UTC
good luck Tony! this would be cool.  refiling as enhancement ;)
Comment 3 Tony L. Svanstrom 2002-09-04 08:48:51 UTC
Subject: Re: [SAdev]  add hashcash support

On Wed, 4 Sep 2002 the voices made bugzilla-daemon@hughes-family.org write:

> ------- Additional Comments From jm@jmason.org  2002-09-04 06:29 -------
> good luck Tony! this would be cool.  refiling as enhancement ;)

 I'm aiming at the next major release, mostly due to not having as much time as
I'd like to work on this.
 The major problem isn't getting the code to work though, it's checking the
format of the "tags" and current implementations IRL.


	/Tony
Comment 4 Simon Josefsson 2002-09-29 07:07:08 UTC
How is this progressing?  I'd like to add support for this now unless someone  is already working on it.   
Comment 5 Tony L. Svanstrom 2002-09-29 08:12:30 UTC
Subject: Re: [SAdev]  add hashcash support

On Sun, 29 Sep 2002 the voices made bugzilla-daemon@hughes-family.org write:

> http://www.hughes-family.org/bugzilla/show_bug.cgi?id=796

> ------- Additional Comments From jas@extundo.com  2002-09-29 07:07 -------
> How is this progressing?  I'd like to add support for this now unless someone  is already working on it.

 If you were to add it right now then it'd just be "hashcash a la SA", because
there's no accepted standard; which more or less makes it useless as a part of
SA.
 shird@dstc.edu.au is working on it though, and seems to be part of all the
standardtalking that's going on...


	/Tony
Comment 6 Shane Hird 2002-09-29 18:45:42 UTC
For further information on getting HashCash into SA, take a look at the CamRam 
mailing list (http://www.camram.org, or at news.gmane.org : 
gmane.mail.spam.camram).

What is needed from the SA side of things, is a way to pass the intended 
recipient onto the test. Ive already got a rough implementation working, but as 
Tony said, it doesnt conform to any set standard, because there isnt really any.
Comment 7 Simon Josefsson 2002-10-07 10:39:41 UTC
Subject: Re:  add hashcash support

bugzilla-daemon@hughes-family.org writes:

> For further information on getting HashCash into SA, take a look at the CamRam 
> mailing list (http://www.camram.org, or at news.gmane.org : 
> gmane.mail.spam.camram).

Yup.  Gnus supports these headers, and I have been sending them for a
while.

> What is needed from the SA side of things, is a way to pass the
> intended recipient onto the test. 

You mean the SMTP envelope destination?  I'm not sure this will work,
from what I hear how this should work on mailing lists is that you
mint a coin for the mailing list itself.  This has some problems, but
I think they are lesser than the problems in having the mailing list
server mint coins for each mailing list member (which is infeasible
even for a small number of members).  Using the contents of the To:
header seems like a better solution to me.

> Ive already got a rough implementation working, but as Tony said, it
> doesnt conform to any set standard, because there isnt really any.

Adam Back has a tool that generates X-Hashcash: headers, is there a
problem in using that format?  Most of what SpamAssassin does is not
formally standardized, so I don't understand why adding pragmatic
support for existing headers (X-Hashcash:) that prevents spam would
hurt.

Comment 8 Shane Hird 2002-10-07 17:52:08 UTC
The problem of using the 'To:' header is it is easily forged. I could send out 
a million messages to different people all with the same 'To:' header and use 
the same token for each person.

It is ideal to use the 'To:' header, but you also need a way to verify that it 
is acceptable. I have done up a quick implementation attached to bug 1041, in 
which you are able to make a list of 'valid_to' addresses/resources. I havent 
tested it, but I imagine you could have something like:

valid_to     john*@spamgourmet.com

and it would match on 'john.3.random@spamgourmet.com' (if you understand how 
spamgourmet works). Another approach is to use 'automatic To: learning' (or 
something) as noted by Daniel in bug 1041. This seems like the ideal approach.

The idea of using the 'X-Envelope-To' header doesnt appeal to me much, because 
MTAs all use different headers, or none at all, which would require you to 
tailer SA to your MTA/MDA, and make sure it strips out any forged ones it 
finds. It also doesnt allow you to make use of a forwarding service such as 
spamgourmet, because the envelope-to would differ from the 'To:' header, which 
would be used for the token and may be acceptable.
Comment 9 Shane Hird 2002-10-07 18:00:03 UTC
BTW: The implementation I have included with bug 1041 is compatible with Adam 
Back's hashcash tool (albeit not fully implemented). I also used 'X-HashCash:' 
instead of 'X-Hashcash' (uppercase 'C'). But it needs to be mostly re-written 
anyway.

Also, concerning mailing lists, this also allows you to use something like:

valid_to       bugtraq_list

And bugtraq could then send out mail using tokens which are generated against 
this resource ('bugtraq_list'). You may also want to include the value required 
for that resource.

A better name may be 'valid_resource'. What you could then do is use 
an 'automatic valid token resource learning' technique, instead of learning 
valid 'To:' fields. This could only be done once hashcash becomes common place 
though.

Is anyone working on an 'automatic To: learning' type system? Is it already 
implemented and I dont even know about it? :P
Comment 10 Daniel Quinlan 2002-10-07 19:10:10 UTC
Subject: Re:  add hashcash support

shird@dstc.edu.au writes:

> The idea of using the 'X-Envelope-To' header doesnt appeal to me
> much, because MTAs all use different headers, or none at all, which
> would require you to tailer SA to your MTA/MDA, and make sure it
> strips out any forged ones it finds. It also doesnt allow you to
> make use of a forwarding service such as spamgourmet, because the
> envelope-to would differ from the 'To:' header, which would be used
> for the token and may be acceptable.

Envelope headers (such as exim's "Envelope-to:") are the best way to
handle this (well, part of the best way -- see below).  Everything else
is a non-optimal solution entailing configuration maintenance or
heuristics.  It's trivial to support the envelope header for the most
commonly used MTAs (sendmail, exim, postfix, qmail, etc.).

Forgery is impossible.  The MTA strips out any present envelope header
and adds its own.  In addition, the final (top-most) Received header is
always added by the last receiver and will indicate the MTA in use.

In addition (this is the other part), I neglected to mention that the
final Received header also often includes the envelope header address:

  Received: from relay07.indigo.ie ([194.125.133.231])
	  by proton.pathname.com with smtp (Exim 3.35 #1 (Debian))
	  id 17yexF-0005Sk-00
	  for <quinlan@pathname.com>; Mon, 07 Oct 2002 13:58:17 -0700

I'd probably try for that first since it's most common and easiest.
Then, fall-back on MTA-specific headers.

Yes, if you use a forwarding service, then you will have to rely on a
heuristic or configuration setting, but that is not an argument
against using the envelope header when it is present.

Comment 11 Simon Josefsson 2002-10-09 12:52:13 UTC
Subject: Re:  add hashcash support

bugzilla-daemon@hughes-family.org writes:

> The problem of using the 'To:' header is it is easily forged. I
> could send out a million messages to different people all with the
> same 'To:' header and use the same token for each person.
>
> It is ideal to use the 'To:' header, but you also need a way to
> verify that it is acceptable. I have done up a quick implementation
> attached to bug 1041, in which you are able to make a list of
> 'valid_to' addresses/resources. I havent tested it, but I imagine
> you could have something like:
>
> valid_to     john*@spamgourmet.com

This seems like a good approach.  I'll see if I can get your
implementation to work...

> The idea of using the 'X-Envelope-To' header doesnt appeal to me
> much, because MTAs all use different headers, or none at all, which
> would require you to tailer SA to your MTA/MDA, and make sure it
> strips out any forged ones it finds. It also doesnt allow you to
> make use of a forwarding service such as spamgourmet, because the
> envelope-to would differ from the 'To:' header, which would be used
> for the token and may be acceptable.

In general, using the envelope address has similar problems as using
the To: header -- I receive mail for many SMTP envelope destinations,
but I only consider one or two my "real" mail address, which is what I
would put in "valid_to" or " valid_resources" or whatever it is
called.

But using X-Envelope-To or parsing Received: lines doesn't seem all
that ugly, considering the amount of guessing SA already does.  Of
course, a real solution could be to add a command line parameter or a
spamd protocol part that tells SA what the real envelope address is.
If the envelope address parsing works, I think SA should default to
scoring up mail with hashcash cookies minted for the SMTP envelope
address.

Comment 12 Justin Mason 2002-10-09 15:04:18 UTC
3 points:

'If the envelope address parsing works, I think SA should default to
scoring up mail with hashcash cookies minted for the SMTP envelope
address.' Sounds good -- and if valid_to is set, then use that instead.

BTW I have comments in EvalTests (iirc) detailing which -To headers
are added by which MTAs.  I nicked it from a procmail page somewhere ;)
So that should provide guidelines for other headers to look at as well
as Envelope-To.

In addition, I strongly support extending the spamc and spamassassin
switch-set to allow the MTA to indicate to Mail::SpamAssassin what the
user's envelope address was, via a command-line switch.  We have something
like this now, "-u", but that only acts for the *username*; in a
virtual-hosting env, jm@jmason.org could be a totally different user
from jm@whatever.com.  (I think there's another bug in the zilla somewhere,
where someone's asked for this to be supported for virtual-configuration
stuff in spamd as well.)



Comment 13 Shane Hird 2002-10-09 18:16:01 UTC
"and if valid_to is set, then use that instead"

valid_to, at least the way I have done it, allows you to specify several 
acceptable addresses, so may want to use it additionally rather than 
exclusively. Although if I'm wrong below, I could see how some people may want 
to use it that way.

"In general, using the envelope address has similar problems as using
the To: header -- I receive mail for many SMTP envelope destinations"

I think it is quite rare/impossible that you would share an envelope address 
with someone else (correct me if Im wrong) - so it should be unique to you, and 
therefore should be ok to use in checking the token. The way Ive done it checks 
a list of acceptable resources, so if one doesnt match, it tries the others in 
the list. So there shouldnt be any problem with using the envelope dest if it 
is available. Keep in mind that what people use for generating the token will 
typically be the address they put in the 'To:' (or CC: etc) header.

"We have somethinglike this now, "-u", but that only acts for the *username*; 
in a virtual-hosting env, jm@jmason.org could be a totally different user
from jm@whatever.com."

Because you have the -u switch, and/or spamc/d runs as the dest user, you could 
also look that user up in a DB (rather than using user_prefs) to check for 
valid addresses. (Doesn't the AWL work like this when it doesnt have access to 
a users $HOME?). I havent played with the SQL features of SA, but I will try 
take a look and see if it can be extended to support this.
Comment 14 Tony L. Svanstrom 2002-10-09 18:44:03 UTC
Subject: Re: [SAdev]  add hashcash support

> ------- Additional Comments From shird@dstc.edu.au  2002-10-09 18:16 -------

> I think it is quite rare/impossible that you would share an envelope address
> with someone else (correct me if Im wrong)

 Some use a "domainwide" mailbox, and then they use a fetchmail(ish) solution
to get the mail and distribute it locally; I've seen all kinds of weird
solutions like that, with all kinds of local headers used/not used.


	/Tony
Comment 15 Shane Hird 2002-10-09 19:13:41 UTC
"Some use a "domainwide" mailbox, and then they use a fetchmail(ish) solution"

That sounds pretty kludgey, but given that, I assume it would be difficult to 
predict the envelope-to address to be able to abuse/use it - and would be rare 
enough that spammers wouldnt rely on it. So it should still be ok to use.

Legit users also wouldnt be able to use this address, so it highlights the need 
to have an additional list of valid to addresses - or some other way of 
determining valid addresses.

Justin Wrote:
"BTW I have comments in EvalTests (iirc) detailing which -To headers
are added by which MTAs."

If you are going to extract the env address from the headers, you would also 
need a way to tell SA which headers to use - you couldnt just check for all of 
them because they could be forged.
Comment 16 Shane Hird 2002-10-09 19:29:25 UTC
"We have somethinglike this now, "-u", but that only acts for the *username*; 
in a virtual-hosting env, jm@jmason.org could be a totally different user
from jm@whatever.com."

I see what your getting at here. I guess you could have a global list of 
acceptable domains, then try the combinations of jm@$domain to see if one 
matches. Theoretically this would allow you to spam all jm's at different 
virtual domains - but thats not much of an issue (except for 
maybe 'webmaster'). Also, my username may be 'shane', but my envelope addresses 
are different - for a lot of people it is the same though.
Comment 17 Duncan Findlay 2002-12-13 19:55:29 UTC
Is anyone progressing with this?
Comment 18 Theo Van Dinter 2002-12-23 22:08:48 UTC
it sounds like there's not a lot of work going on about this.  so I'm setting
the target milestone to 3.0.
Comment 19 Simon Josefsson 2003-01-03 15:50:16 UTC
Subject: Re:  add hashcash support

bugzilla-daemon@hughes-family.org writes:

> it sounds like there's not a lot of work going on about this.  so I'm setting
> the target milestone to 3.0.

Shane Hird did some work [1] with this, wouldn't it be possible to add
that to SpamAssassin?  Even if it is not a Internet Standard (which I
thing it never will be) it will prevent spam.

[1] http://www.camram.org/mhonarc/spam/msg00550.html,
http://bugzilla.spamassassin.org/show_bug.cgi?id=1041

Comment 20 Tony L. Svanstrom 2003-01-03 16:04:50 UTC
Subject: Re: [SAdev]  add hashcash support

On Fri, 3 Jan 2003 the voices made bugzilla-daemon@hughes-family.org write:

> ------- Additional Comments From jas@extundo.com  2003-01-03 15:50 -------
> Subject: Re:  add hashcash support
>
> bugzilla-daemon@hughes-family.org writes:
>
> > it sounds like there's not a lot of work going on about this.  so I'm setting
> > the target milestone to 3.0.
>
> Shane Hird did some work [1] with this, wouldn't it be possible to add
> that to SpamAssassin?  Even if it is not a Internet Standard (which I
> thing it never will be) it will prevent spam.

 It's not worth it, there's no standard and not many are using it, so it's just
a waste of resource...


	/t
Comment 21 Justin Mason 2003-02-03 05:18:17 UTC
Tony -- our position (well me and Theo at least) is that if we can encourage use
of an auth system (like hashcash), we should.  At least it would mean our mails
would be whitelisted ;)

I think we could get this in once 2.60 starts up.  What I propose is this.

- we incorporate "valid_to" from bug 1041.  Rename it to
  "hashcash_resource_accept",
  since it's impl-specific and I don't want to require valid To addresses
  for any other tests.

  We *do* need this for hashcash, AFAICS, since (a) some versions of sendmail
  do NOT leave the envelope To addr in the mail message anywhere :( and (b)
  consider the "spamgourmet" case, where the user wants to accept tokens
  for resources on other machines with (possibly) totally different names,
  since they know those addrs forward to the current addr.

- we also figure out the envelope To addr using code we have in EvalTests
  already.  This just needs to be abstracted a little.  Then we can use the
  envelope-To as the default setting for "hashcash_resource_accept".

- Shane -- what's the current situation with Hashcash usage and
  "de-facto standardization"? ;)  Is there a "safe" protocol that will probably
  work?

- Also, I'll need a patch, none of this "copies of files" rubbish ;)


Comment 22 Adam Back 2003-02-10 12:36:34 UTC
Note the Shane Hird code has one problem: it is using 111111111...111b as the
challenge, and the hashcash library code uses 0000000...000b as the challenge.

The 0000000...000b is the correct string.

Adam
Comment 23 Daniel Quinlan 2003-05-18 21:40:34 UTC
moving a bunch of bugs to 2.70 milestone
Comment 24 Simon Lyall 2003-07-13 04:15:22 UTC
I was at a conference this week (talking about Spam) and ended up talking to
David Harris who is the author of the popular Windows mail client Pegasus Mail (
http://www.pmail.com/ ) . I told him about hashcash and he was interested in the
idea (with various reservations) but I'm a bit dubious about recomending he use
it unless it's going to be supported elsewhere.

It would appear to me that the basic systems as defined by Adam Back is pretty
good and has the virtue of being simple to implient and use. The big problem
would seem to be at the double spend level. I'm hoping that double spends can be
blocked by a combination of a local database and a hook into DCC or similar.

It would also appear to be straightforward for me (as an ISP admin) to add a
little program to append this header on outgoing email from customers (up to
their first 20 emails per day). In that case it might be generated with
postmaster@ihug.co.nz as the "resource" rather than the From address however.

From spamassassin we could initually do something like below to start with:

header HASHCASH_20           eval:check_hashcash('20')
describe HEADCASH_20         Hashcash match 20 bits or more
score HASHCASH_20              -3.0

header  HASHCASH_LOCAL    eval:hashcash_db_lookup()
describe  HASHCASH_LOCAL  Hashcash already in local database
score  HASHCASH_LOCAL       6.0


What do people think? Hashcash appears to be a good idea but needs to initual
push, Pegasus mail and a simple script to generate the headers by MTAsalong with
spamassassin l,oooking for them would seem to be a good push. A inital local
collision database and later hook into DCC or Razor will ensure that spammers
won't be able to wholesalely forge it without be detected at the vast majority
of sites.

(a) Is it safe to recomend the:

X-Hashcash:0:030713:simon@darkmere.gen.nz:911d15251bc1a8a5

format? (perhaps with a space at the start of the header).

(b) Do people feel a one week expire time for hashcash headers is good? This
would enableMUAs/MTAs to create them in advance but ensure that that databasess
like DCC don't have to remember them forever.
Comment 25 Adam Back 2003-11-03 20:55:02 UTC
simon.lyall@ihug.co.nz wrote:
> [...]
> It would also appear to be straightforward for me (as 
> an ISP admin) to add a little program to append this 
> header on outgoing email from customers (up to their 
> first 20 emails per day). In that case it might be 
> generated with postmaster@ihug.co.nz as the 
> "resource" rather than the From address however.

Minor nit: the resource should be the recipient's address.  So postmaster at 
sending site would in that context not be accepted by the recipient as he's 
expecting an address that he's willing to accept mail for in the resource 
string.  (He can only accept at his own addresses or people can re-use tokens 
by sending the same token to multiple people).

> Is it safe to recomend the:
> 
> X-Hashcash:0:030713:simon@darkmere.gen.nz:911d15251bc1a8a5
> 
> format? (perhaps with a space at the start of the header).

The missing space was a bug introduced in hashcash-0.26, it was fixed in 0.27, 
so if you get that or 0.28 it should be back to as before.

> (b) Do people feel a one week expire time for hashcash 
> headers is good? This would enableMUAs/MTAs to create 
> them in advance but ensure that that databasess
> like DCC don't have to remember them forever.

it's a tradeoff between storage and reliability.  If mail
happens to get delayed for longer than 1 week, it loses
it's hashcash scoring as the hashcash will be considered
invalid as "expired".  I had suggested 28 days (that
was introduced as a default in one of the more recent
versions).  But it's a matter of taste and how you 
interpret mail delivery semantics plus a fudge factor.

I would think the storage costs of a hashcash token per
received mail ought to be fairly low compared to the
other things an MTA is storing but I don't have the stats
to back that up.

I'm not sure about the software framework so don't know
what DCC is, but couldn't one just plug a berkeley db
in under hashcash tools DB layer, or if using the perl hashcash
module, use perls tied hashes to store in a berkeley db?
Comment 26 Justin Mason 2003-11-03 22:37:10 UTC
BTW, I think it might be worth getting this in, using:

- "hashcash_resource_accept" = resource to accept hash-cash tokens for.
  For now, that should be just an email address or list of addresses,
  same kind of semantics as "whitelist_to" or similar (including the globbing!
  I want to receive all addrs @jmason.org, for example.)

- X-Hashcash: header should always have a space after the first colon

- one-week expiry time, at the most, and no DCC server; just a local .db file,
  in "~/.spamassassin/hashcash_seen".   The DCC server is not necessary,
  if we're using the hashcash_resource_accept idea above, since an incoming
  token will not be accepted by another server unless the resource matches.

Comment 27 Simon Lyall 2003-11-04 00:22:42 UTC
Justin Mason  wrote:
> "hashcash_resource_accept" = resource to accept hash-cash tokens for.
> For now, that should be just an email address or list of addresses,
>  same kind of semantics as "whitelist_to" or similar (including the globbing!
>  I want to receive all addrs @jmason.org, for example.)

Sounds good, I'd probably configure my servers to accept any token, since with
thousands of users I can't determine what domain/address the customer might be
receiving email addressed to (after multiple forwards, BCCs, emails lists etc,
some outside my control).

> one-week expiry time, at the most, and no DCC server; just a local .db file,
> in "~/.spamassassin/hashcash_seen".   The DCC server is not necessary,
> if we're using the hashcash_resource_accept idea above, since an incoming
> token will not be accepted by another server unless the resource matches.

Local db file sounds good. It'll do 90% of the job a distributed one would with
10% of the work. If people start swapping hashes later then support can be added.
Comment 28 Justin Mason 2003-11-08 12:20:02 UTC
update: new CPAN module:

       use Digest::Hashcash;
       $prefix = $cipher->verify($token [, param => value...]))

           Checks the given token and returns true if the token has the mini-
           mum number of prefix bits, or false otherwise.  The value returned
           is actually the number of collisions, so to find the number of col-
           lisions bits specify "collisions => 0".

           Any additional parameters are interpreted the same way as arguments
           to "new".

This looks very doable for 2.70 ;)
Comment 29 Justin Mason 2003-11-08 12:25:59 UTC
Here's a potential problem with the idea of penalizing duplicate messages using
the same stamp; consider a message

  From: foo@example.com
  To: jm@example.com, spamassassin-talk@Example.com
  X-Hashcash: [valid token for jm@example.com]

where "spamassassin-talk" is a list of which "jm" is a member.

"jm" will get two copies -- one via SMTP from foo's server to jm's server,
and later, one via SMTP from the sa-talk server to jm's server, possibly even
with modifications to the body.

If we were to simply record hashcash tokens to block replays, we'd wind up
penalizing the list mail since it bears the same token.

Suggestions?  (preferably suggestions that don't involve taking a hash of the
mail headers and parts of the body ;)
Comment 30 Adam Back 2003-11-08 12:46:09 UTC
Yes suggestion: both of the addresses should have a X-Hashcash header.  So 
spend only one of them if there are two that you are willing to accept for.

(There is one header per recipient, not one per mail; see for example the emacs 
client behavior).

(ie if you are on the above mentioned mailing list your 
hashcash_resource_accept string will include that address also.)

Adam
Comment 31 Justin Mason 2003-11-08 18:24:50 UTC
> (ie if you are on the above mentioned mailing list your 
> hashcash_resource_accept string will include that address also.)

I don't think that'll work, unfortunately.  Prior experience has shown
that requiring users to manually list their mailing list subscriptions,
as would be required here, will not fly -- it's just too much work to do.

Also, SA's approach is to require minimal hand-configuration...

We can work around by not penalizing double-spending, instead just ignoring
the double-spent token.  At least that way the spammer gets no benefit.
Comment 32 Adam Back 2003-11-08 20:04:14 UTC
Sure you could do that as a default for people who do not put their list 
subscriptions in.  If I understand you're saying: if the stamp is not for an 
address I receive mail as pretend it's not there for scoring purposes.

People who do fill in the full set of what they receive mail as will get better 
accuracy and lower risk of false positives for their mailing list traffic.

But I'm not sure why the token would be double spent.

Let's say (your example, except I put in the missing token which a hashcash 
client would put in, one per recipient):

  From: foo@example.com
  To: jm@example.com, spamassassin-talk@Example.com
  X-Hashcash: [valid token for jm@example.com]
  X-Hashcash: [valid token for spamassassin-talk@example.com]

And jm is lazy and doesn't bother with adding his list subscriptions.

That means spamassassin infers his address as jm@example.com, so 

  hashcash_resource_accept = jm@example.com

spamassassin takes the first stamp and marks it as double-spent.  The 2nd token 
is ignored.

The mail will get delivered twice.  Whichever arrives first will result in the 
hashcash stamp for jm@example.com being considered spent.  Whichever arrives 
2nd will get no +ve (or -ve) scoring from the token.

If the user sets hashcash_resource_accept to:

  hashcash_resource_accept = jm@example.com,spamassassin-talk@example.com

both copies of the mail will get +ve scoring from the corresponding hashcash 
stamp.

If the user sets hashcash_resource_accept to:

  hashcash_resource_accept = *@example.com

again they'll both be considered valid.  (Take only the 1st valid stamp and 
consider it double spent.  Rule: 1 stamp per received copy of a mail.)

btw you might want something more mnemonic than resource -- eg 
hashcash_my_addresses or hashcash_valid_addresses_for_me.  Whatever you think.  
The "resource" terminology from hashcash is generalised beyond email addresses 
to web pages, web servers, IP addresses etc to protect any "resource", and I've 
found people find confusing as applied to email.

Adam
Comment 33 Adam Back 2003-11-08 20:05:06 UTC
(add me to Cc list.)
Comment 34 Justin Mason 2003-11-08 20:27:11 UTC
> The mail will get delivered twice.  Whichever arrives first will result in the 
> hashcash stamp for jm@example.com being considered spent.  Whichever arrives 
> 2nd will get no +ve (or -ve) scoring from the token.

Yes, that sounds good.   The reason I raised it, is because there was talk of
considering a double-spend event something that should be penalized (+ve SA
score).  This example is a situation where a double-spend could occur without
any spammer trickery involved, which would indicate that penalizing
double-spends would be a bad idea.

> btw you might want something more mnemonic than resource -- eg 
> hashcash_my_addresses or hashcash_valid_addresses_for_me.  Whatever you think.  
> The "resource" terminology from hashcash is generalised beyond email addresses 
> to web pages, web servers, IP addresses etc to protect any "resource", and
> I've found people find confusing as applied to email.

Good idea -- it does seem clear that in the hashcash scheme, "resource" does map
 to "To address that may deliver to me".   That's good news, as it's pretty
simple (once the "resource" term is avoided), and will be possibly useful for
other systems as well as hashcash.

I think the proposed naming (from earlier in this bug's history) of this setting
as "valid_to" is probably the easiest for people to understand.

Comment 35 Simon Lyall 2003-11-09 22:33:04 UTC
>> The mail will get delivered twice.  Whichever arrives first will result in the 
>> hashcash stamp for jm@example.com being considered spent.  Whichever arrives 
>> 2nd will get no +ve (or -ve) scoring from the token.

>Yes, that sounds good.   The reason I raised it, is because there was talk of
>considering a double-spend event something that should be penalized (+ve SA
>score).  This example is a situation where a double-spend could occur without
>any spammer trickery involved, which would indicate that penalizing
>double-spends would be a bad idea.

It still makes sitewise hard. If a site has 100 people all subscribed to one
mailing list then that single token is going to be very overspent.

However as long as the overspend penalty is too much greater than the positive
value for the hashcash then it's as if it never existed. For mailing lists
hashcash won't provide much benifit but hopefulyl it won't do much harm.

Suggested tests with negative scores:

describe HASHCASH_MATCHES_VALID   hashcash token matches value in valid_to
describe HASHCASH_NOT_MATCH_VALID hashcash token Doesn't match any valid_to
describe HASHCASH_20              Hashcash match 20 bits or more
describe HASHCASH_21              Hashcash match 21 bits or more
describe HASHCASH_22              Hashcash match 22 bits or more
describe HASHCASH_23              Hashcash match 23 bits or more
describe HASHCASH_24              Hashcash match 24 bits or more
describe HASHCASH_LOCAL_0         Hashcash not in local dabase
describe HASHCASH_LOCAL_1         Hashcash token 1 match in local database
describe HASHCASH_LOCAL_2_5       Hashcash token 1-5 matches in local database


Suggested tests with positive scores:

describe HASHCASH_EXPIRED         hashcash token has expired
describe HASHCASH_19              Hashcash match 19 bits or less
describe HASHCASH_LOCAL_6_10      Hashcash token 6-10 matches in local db
describe HASHCASH_LOCAL_10_30     Hashcash token 10-30 matches in local db
describe HASHCASH_LOCAL_30_100    Hashcash token 30-100 matches in local db
describe HASHCASH_LOCAL_100_PLUS  Hashcash token 100+ matches in local db

which should just above cover everything. As long as spammers can't use it to
get net-negaitive scores then the odd case where a legit user has his scores
cancel to zero should be a problem.
Comment 36 Adam Back 2003-11-10 22:10:08 UTC
> It still makes sitewise hard. If a site has 100 
> people all subscribed to one mailing list then 
> that single token is going to be very overspent.

I thought there was a way to do sitewise, I had worked
this out in the past as it was one of the deployment 
approaches discussed on various lists.  If I remember 
here's how it goes:

You keep a separate database for each recipient.  ie
so if two people joe@isp.com and fred@isp.com are both
subscribed to some list foo-list@lists.com then you'll 
key by envelope recipient and store the one for joe in 
joe.db and the one for fred in fred.db, and everything 
is happy once more.

(Note it doesn't have to be a separate database, but it
has to be logically separated in this way so that it
won't be considered a double spent token if it's 
addressed to someone else.)

Now of course this is still subject to the limitations
that Justin gave which is that neither joe or fred 
are going to get any +ve (or -ve) score from seeing a 
mail with a stamp for foo-list unless spamassassin knows
they are willing to receive mail as foo-list.

I'm not familiar with spamassassin config, so an additional
question is:

Is it possible with spamassassin in a multi-user 
environment for a user to have local config adding 
to the default of $USER@isp.com?

That would be pretty neat because then users who care can
get positive scoring from their list subscription traffic,
or forwarded mail traffic (say joe also is joe@pobox.com and
that is forwarded to joe@isp.com).  Users who don't understand
or care get the default.

Say fred also has a local hashcash aware MUA in addition, his
could then also add the missed +ve scoring from foo-list back 
in to exempt from spamassassin -ve spamminess score or 
adjust the spamassassin score.
Comment 37 Justin Mason 2003-12-15 00:23:05 UTC
OK, it's now in current CVS, and seems to be working; I didn't use Shane's code
in the end as I eventually realised how simple the hashcash verification step is
-- just take a SHA1 hash and count the bits! Great!

'You keep a separate database for each recipient.  ie
so if two people joe@isp.com and fred@isp.com are both
subscribed to some list foo-list@lists.com then you'll 
key by envelope recipient and store the one for joe in 
joe.db and the one for fred in fred.db, and everything 
is happy once more.'

yep, that's what I've done.  Each user has their own db, and I've noted in the
Conf man page that sharing a db sitewide is a Bad Idea.

'Is it possible with spamassassin in a multi-user 
environment for a user to have local config adding 
to the default of $USER@isp.com?'

Yes, that'll work -- sysadmin sets up local.cf with

  hashcash_accept %u@hostname.com

and user can then have their own lines in ~/.spamassassin/user_prefs like

  hashcash_accept *@jmason.org

File-style globbing is permitted, and %u expands to the current username (where
applicable).

BTW currently I have scores for values from 20 to 25 bits; anything over 25 just
gets the same score, and anything under 20 gets no bonus.  Do these ranges make
sense, or should we be using different ranges?