GenSpam

Please cite this paper when using the GenSpam corpus.

Data

The corpus consists of six files:
- train_GEN.ems    Genuine email training data (8158 messages)
- train_SPAM.ems    Spam training data (30099 messages)
- test_GEN.ems    Geniune test data (754 messages)
- test_SPAM.ems    Spam test data (797 messages)
- adapt_GEN.ems    Adaptive genuine training data (300 messages)
- adapt_SPAM.ems    Adaptive spam training data (300 messages)

Download here (gzipped tar file):