Archive for March 2004

More China blocking

The spread of the Witty worm

Shannon and Moore:
The Spread of the Witty Worm:

Witty infected only about a tenth as many hosts than the next smallest widespread Internet worm. Where SQL Slammer infected between 75,000 and 100,000 computers, the vulnerable population of the Witty worm was only about 12,000 computers. Although researchers have long predicted that a fast-probing worm could infect a small population very quickly, Witty is the first worm to demonstrate this capability. While Witty took 30 minutes longer than SQL Slammer to infect its vulnerable population, both worms spread far faster than human intervention could stop them. In the past, users of software that is not ubiquitously deployed have considered themselves relatively safe from most network-based pathogens. Witty demonstrates that a remotely accessible bug in any minimally popular piece of software can be successfully exploited by an automated attack.

Benchmark the anti-spam industry!

It would be very valuable to have an ongoing head-to-head benchmarking of all the current contenders in the anti-spam industry — not just the learning systems, but the online dynamic systems as well.
Form a consortium, operate a bunch of systems (be a customer of commercial systems).
Use the same simultaneous data stream as input, and capture real-time state from the online dynamic systems (return it to the providers so they can replay what went wrong [or right]).
Publish the performance results.

It would generate good data for more research, and really useful comparable performance metrics.
I’m not sure if that would be seen as a good thing or a bad thing by
the commercial services. Laggards in the horserace might prefer less measurement.
Actually, what I think it would show is that most systems are “almost good enough,”
that all systems will soon be “good enough,” that
there’s little excuse not to deploy something,
but there’s plenty of space for distinction based on features such as administration, tunability, interface, integration.
But one would hope that performance metrics would drive the industry forward.

A SPEC effort for real-time and offline anti-spam systems!
Is anyone else inspired by the idea of a non-biased testing/evaluation consortium?

DSPAM does noise reduction and bi-grams

I’ve tried CRM114 and know it performs very well.
I’m just catching up on my DSPAM reading.

Bayesian Noise Reduction looks really helpful, and reduces the cost of implementing bi-grams (Chained Tokens in DSPAM terminology).
Author Jonathan A. Zdziarski gives typical storage figures of 0.5MB-1MB for the average user without bigrams, and 10MB-20MB with. Disk is cheap.

Personally I was thinking of experimenting with boosting into longer n-grams as a way of achieving some space and time tradeoffs. I haven’t had time, though.

While I don’t disbelieve the performance numbers,
I do wish for more corpora (larger and more diverse) and standardized oerformance metrics.

UB buys IBM BladeCenter

Keeping track of my colleagues down the street:
ClusterWorld | University at Buffalo Adds IBM Blades:

The new supercomputer, capable of a peak performance of more than 1.32 TeraFlops, will consist of a cluster of 266 IBM eServer� BladeCenter� HS20 systems running Red Hat Advance Server 2.1 Linux, each with two 2.8 GHz Intel Xeon processors and 1.0 GB of memory. Seven IBM xSeries 345 Intel processor-based servers connect to 5 terabytes (TB) of IBM FAStT700 Storage to house large volumes of biological and research data. The supercomputer forms the basis of the IBM eServer Cluster 1350, a pre-packaged and tested supercluster that is ultra-dense and incredibly easy to manage.

Forging S/MIME signatures

Jon Udell tries his hand at S/MIME signature forgery,
revealing that PKI is not a panacea.

A digital signature proves something. The proof is strong but the something is weak (if it just demonstrates that you clicked a few things to get a persona certificate).

So if you need to prove something stronger, then you put limits on what digitally-signed content you’re willing to accept.
This can go in at least two directions (not mutually exclusive):

  • higher-class certificates (where certificate authorities demand more proof, and encode that fact in the certificate). But higher quality means harder to get and less actual deployment. And higher quality means more attractive target for theft of keys.
  • reputation systems. Of course, building robust reputation systems is not easy. Users may wish to have multiple sources of reputation information to fit their own definitions of good and bad behavior and how fast those judgments are made. It replays the whole DNS blacklist deployment. Some reputation systems may seem arbitrary and capricious. Others may be too slow or too tolerant. They are all lawsuit targets. Will there be too many to choose from?

For message classification, there is a predisposition to disparage machine learning and content inspection as too
probabilistic and uncertain, while viewing signatures as certain and reliable. It is not so, the uncertainty or trust is not eliminated, it’s just at a different level.

unescaped, escaped, double-escaped

Tim Bray explores the mess related to escaping HTML/XML information:

The policy ideally should be, I think, that all data in the Your Code block has to be known to be escaped or known to be unescaped. That is to say, you always do escaping on the data at the pointy end of the input arrows, or you never do it.

I think always-unescaped is a little better, since some of those output arrows might not be XML or HTML, but probably they all are; so always-escaped is certainly viable.

and then it gets worse, as treatment of HTML in RSS aggregators varies.

The same problem presents itself in cross-site scripting and code injection attacks.
It’s the bane of macro language beginners too, whether it’s shell or troff.

Avolio – Security Redux

Fred Avolio’s Weblog: Security Redux succinctly summarizes how many aspects of the security discussion are not new, but resurface because of ignorance of the field.

My first receipt of a CAPTCHA-bearing virus

During the last round of virus innovation a couple of weeks ago (email viruses with encrypted payloads bearing passwords in the text, circa March 3), I predicted to a colleague that the next obvious step would be an embedded CAPTCHA image to make it harder for antivirus gateways to find the password for decoding encrypted attachments. It didn’t take long; I received my first CAPTCHA-bearing virus last Saturday (March 13).

Of course, this is only a novelty in email viruses. It’s old-hat for email spam; for example a significant proportion of the Russian-language spam I see is image-only, with an embedded phone number, and not even a single URI.

As for the virus,
Trend Micro OfficeScan identifies the extracted file as PE_BAGLE.N-O, here’s a snippet:

Delivery-Date: Sat Mar 13 20:00:06 2004
Received: from ( [])
        by (8.12.9/8.12.4) with SMTP id i2E0xxGf013039
        for <>; Sat, 13 Mar 2004 20:00:00 -0500 (EST)
Date: Sat, 13 Mar 2004 19:59:56 -0500
Subject: Re: Thank you!
Message-ID: <>
MIME-Version: 1.0
Content-Type: multipart/mixed;
Content-Length: 34491
Content-Type: text/html; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Your file is attached.<br><br>
<BR>Password - <img  src="cid:rjsdmyhbsf.bmp"><BR>
Content-Type: image/bmp; name="rjsdmyhbsf.bmp"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="rjsdmyhbsf.bmp"
Content-ID: <rjsdmyhbsf.bmp>
Content-Type: application/octet-stream; name=""
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename=""

Bypassing China’s firewall