"Stopping Spam"

Joshua Goodman, Microsoft

Thursday, October 7, 2004 at 4:00 P.M.

Lubrano Conference Room

Spam is a huge and growing problem. I'll first survey solutions to spam, including filtering approaches (machine learning, fuzzy hashing, and blackhole lists) and "postage" approaches, including reverse Turing tests, computational puzzles, and monetary challenges. Our favorite technique is a machine learning/text classification approach combined with a challenge/response postage approach. I'll talk about problems and solutions we've had in practice, especially how we have gotten millions of messages of labeled training data, both good and spam. I'll also talk briefly about my research on personalizing spam filters, which turns out to be important, but harder than we thought. I'll show some analyses of those millions of messages, including where spam actually comes from, and why legal solutions can only stop a fraction of spam. Next, I'll talk about why email in general and spam in particular need their own new field, combining aspects of machine learning, networking, cryptography/security, HCI, and economics.

Joint work with Geoff Hulten, Robert Rounthwaite, David Heckerman, and others.

Bio: Joshua Goodman started his professional life as a developer at Dragon Systems, working on speech recognition. He then went to grad school at Harvard University, receiving a Ph.D. for his work in statistical natural language processing, especially statistical parsing. From there, he went to Microsoft Research, where he worked on language modeling. For the past 2 and a half years, he has been working on stopping spam, including helping start Microsoft's Anti-Spam Technology Group. He is General Chair for the Conference on Email and Anti-Spam 2005, which everyone should attend.

Host: Eugene Charniak