Archive for the 'spam' Category

MessageLabs versus GMail

Monday, March 31st, 2008

MessageLab mail forwarders have been unwilling to talk to GMail servers at least since Saturday 2008-03-29, with a mix of TCP “connection refused” and SMTP “421 Service Temporarily Unavailable”.

Perhaps it’s related to flurry of articles about GMail CAPTCHA cracking three weeks ago and the resulting surge of spam.

Whatever the reason, it’s a painful outage.

Followup, Tuesday 2008-04-01 7:48am EDT:

MessageLabs appears to be listening to GMail’s servers again. New messages are flowing. I haven’t seen the messages queued up during the outage, yet.

More detail on the Google CAPTCHA-cracking botnet.

Spam introspection

Wednesday, October 6th, 2004

Georgetown University sends spam and faces the wrath of one of its own students.

I’m also getting a little tired of “call for paper” spam sent by otherwise-legitimate conference organizers to lists of web-harvested email addresses. My most frequent offenders will remain nameless for now, but only because I’m busy.

Just because you’re not a fraudulent criminal enterprise doesn’t mean you’re not a spammer. It would not be a bad thing if everyone started worrying about CAN-SPAM being enforced against them.

Mail server choices for anti-spam — hijacked or derailed by patents?

Thursday, September 2nd, 2004

Yakov Shafranovich on Sender ID and software patents from Microsoft: Part I, Part II

Update: Eric Raymond is “quoted a promise of a license with no royalties and no requirement to sign an agreement.” That would be helpful if such a license came to pass.

Clever Zombie Tracking by Manipulating DNS Views

Thursday, August 12th, 2004

James Lick: Tracking A Zombie Army (PDF)
[via Asrg]

Yahoo DomainKeys draft specification

Wednesday, May 19th, 2004

Yahoo publishes its DomainKeys specification. FAQ at Yahoo! Anti-Spam Resource Center - DomainKeys.

I must say that I share Justin Mason’s distrust and disdain for software patents. What the heck is patentable among these ideas anyway? They seem like obvious applications of digital signatures and DNS publication. The most generous interpretation is that these might be defensive patents, and that for all intents, the IETF-required license is good enough.

Is this or SPF likely to take the world by storm? Either one permits senders to publish records that permit receivers to make some authentication judgments.

Well, deployment by senders is a bit more work (sign those messages) for DK than for SPF. But SPF breaks what has been considered normal forwarding behavior, in a way that the sender has no control over except by saying “put up with it” or by turning off SPF.

Deployment by receivers has no particular downside for either scheme — you’re basically implementing sender-requested filtering, and who can complain about that?

Of course, initially, rather than trying to subvert either scheme, spammers will avoid both. Is it possible that the world will shift so much that just being a non-DK domain will count against the sender? I do think it’s possible. At which point, yes, spammers adopt the technology but subvert it with throwaway domains and proxy zombies with access to signing servers. You can’t avoid reputation systems in the end, trusted third parties, (some even having good incentives to rate accurately and respond quickly), blacklists, etc.

Chi-squared evidence combination

Thursday, April 29th, 2004

More on Gary Robinson’s improved chi-squared evidence combination at Handling Redundancy in Email Token Probabilities

Benchmark the anti-spam industry!

Friday, March 26th, 2004

It would be very valuable to have an ongoing head-to-head benchmarking of all the current contenders in the anti-spam industry — not just the learning systems, but the online dynamic systems as well. Form a consortium, operate a bunch of systems (be a customer of commercial systems). Use the same simultaneous data stream as input, and capture real-time state from the online dynamic systems (return it to the providers so they can replay what went wrong [or right]). Publish the performance results.

It would generate good data for more research, and really useful comparable performance metrics. I’m not sure if that would be seen as a good thing or a bad thing by the commercial services. Laggards in the horserace might prefer less measurement. Actually, what I think it would show is that most systems are “almost good enough,” that all systems will soon be “good enough,” that there’s little excuse not to deploy something, but there’s plenty of space for distinction based on features such as administration, tunability, interface, integration. But one would hope that performance metrics would drive the industry forward.

A SPEC effort for real-time and offline anti-spam systems! Is anyone else inspired by the idea of a non-biased testing/evaluation consortium?

DSPAM does noise reduction and bi-grams

Friday, March 26th, 2004

I’ve tried CRM114 and know it performs very well. I’m just catching up on my DSPAM reading.

Bayesian Noise Reduction looks really helpful, and reduces the cost of implementing bi-grams (Chained Tokens in DSPAM terminology). Author Jonathan A. Zdziarski gives typical storage figures of 0.5MB-1MB for the average user without bigrams, and 10MB-20MB with. Disk is cheap.

Personally I was thinking of experimenting with boosting into longer n-grams as a way of achieving some space and time tradeoffs. I haven’t had time, though.

While I don’t disbelieve the performance numbers, I do wish for more corpora (larger and more diverse) and standardized oerformance metrics.

Forging S/MIME signatures

Tuesday, March 23rd, 2004

Jon Udell tries his hand at S/MIME signature forgery, revealing that PKI is not a panacea.

A digital signature proves something. The proof is strong but the something is weak (if it just demonstrates that you clicked a few things to get a persona certificate).

So if you need to prove something stronger, then you put limits on what digitally-signed content you’re willing to accept. This can go in at least two directions (not mutually exclusive):

  • higher-class certificates (where certificate authorities demand more proof, and encode that fact in the certificate). But higher quality means harder to get and less actual deployment. And higher quality means more attractive target for theft of keys.
  • reputation systems. Of course, building robust reputation systems is not easy. Users may wish to have multiple sources of reputation information to fit their own definitions of good and bad behavior and how fast those judgments are made. It replays the whole DNS blacklist deployment. Some reputation systems may seem arbitrary and capricious. Others may be too slow or too tolerant. They are all lawsuit targets. Will there be too many to choose from?

For message classification, there is a predisposition to disparage machine learning and content inspection as too probabilistic and uncertain, while viewing signatures as certain and reliable. It is not so, the uncertainty or trust is not eliminated, it’s just at a different level.

Caller ID, Domain Keys, SPF

Friday, March 5th, 2004

Larry Seltzer (eWeek) compares, contrasts, predicts Who Will Win the SMTP Authentication Wars?:

This isn’t like three brands of bleach, where you’ve got the same chemicals in all three bottles. In fact, the more you look at these standards, the more different they look. I had been fearful that having three major standards competing would be discouraging to the market, since explaining even one of them isn’t easy. And consider that the three major mail providers in the United States—AOL, Yahoo! and Microsoft—are implementing the three different standards. I think, however, that the three, or at least two of them, could complement each other. The ideal solution may be all three, or some later standard that combines the features of two or three.

[via Christopher Allen]