Training to exhaustion = adaptive boosting?

Gary Robinson,
inventor of the novel clever and useful
chi-squared non-bayesian evidence combination method,
that in practice seems to work pretty darned well for classifying spam (better than Naive Bayes), has written an article on
Training to Exhaustion.

I think he has re-invented a less general version of the AdaBoost
algorithm, in which training inputs are weighted according to classification error. The specific weight adjustments in AdaBoost will probably converge much more quickly than the small incremental reweighting in training to exhaustion, and Schapire’s paper shows some nice properties overall.

While bag-of-words models work remarkably well considering how simple they are, I think that progress will come from elsewhere.
Instead of training harder (weighting hard examples in the training set), an algorithm could train “smarter” (applying more expensive techniques [e.g. extending n-gram length] but only for the hard examples). I’ve been contemplating experimenting with
the tradeoffs in cost/performance but haven’t had the time (yet).


  1. Boris gets the credit for “inventing” this if there is credit to be given — though I suspect that the idea has bubbled up elsewhere and the main contribution here is testing it. I really don’t know. I thought it might be useful to post something about it so I did.

    I’ll alert Boris to your post and see if he wants to comment about its relationship to AdaBoost.

  2. Justin Mason says:

    Ha — “Ada” stands for adaptive? There was me thinking it stood for the language! ;)

    Very interesting work. Thanks to Gary and Boris for putting in the hours to empirically test this and write it up so well…

Leave a Reply