ITAC (Informal Text Anonymisation Corpus)

Please cite this paper when using the ITAC corpus.

Data

The corpus consists of six files:
- DEV.bla    development data (annotated with `blanket' annotation scheme)
- DEV.sel    development data (annotated with `selective' annotation scheme)
- TST.bla    test data (annotated with `blanket' annotation scheme)
- TST.sel    test data (annotated with `selective' annotation scheme)
- TRN.txt    unlabeled training data

Download here (gzipped tar file):