Benchmark the anti-spam industry!

It would be very valuable to have an ongoing head-to-head benchmarking of all the current contenders in the anti-spam industry — not just the learning systems, but the online dynamic systems as well.
Form a consortium, operate a bunch of systems (be a customer of commercial systems).
Use the same simultaneous data stream as input, and capture real-time state from the online dynamic systems (return it to the providers so they can replay what went wrong [or right]).
Publish the performance results.

It would generate good data for more research, and really useful comparable performance metrics.
I’m not sure if that would be seen as a good thing or a bad thing by
the commercial services. Laggards in the horserace might prefer less measurement.
Actually, what I think it would show is that most systems are “almost good enough,”
that all systems will soon be “good enough,” that
there’s little excuse not to deploy something,
but there’s plenty of space for distinction based on features such as administration, tunability, interface, integration.
But one would hope that performance metrics would drive the industry forward.

A SPEC effort for real-time and offline anti-spam systems!
Is anyone else inspired by the idea of a non-biased testing/evaluation consortium?

