TimeML Corpora

Several TimeML corpora have been created over the years. TimeBank started as an illustration and proof of concept of the TimeML specifications. TimeBank 1.1 was created in the early days of TimeML and follows the 1.1 version of the specifications. The more recent TimeBank 1.2 and the AQUAINT corpus were compiled following the 1.2.1 specifications. The TempEval corpus was created for the SemEval -2007 workshop at the ACL in Prague. It contains the same documents as TimeBank 1.2 but uses a simplified set of temporal relations, grouped in three separate tasks.

Future plans are (i) to merge TimeBank 1.2 and the Aquaint TimeML corpus, and (ii) to create a significantly larger TimeBank using by now widely accepted corpus creation standards like dual annotation and the 90% rule.

TimeBank 1.2

The TimeBank 1.2 Corpus contains 183 news articles that have been annotated following the TimeML 1.2.1 specification. TimeBank 1.2 is free and is distributed by the Linguistic Data Consortium. Read more about TimeBank 1.2 in the documentation, which is mostly identical to the timebank.html file included in the LDC distribution, except that timebank.html has no section on inter-annotator agreement.

AQUAINT TimeML Corpus

The AQUAINT TimeBank contains 73 news report documents. It is very similar in content to, and uses the same specifications as, TimeBank 1.2. Version 1.0 is available for download.

TempEval Corpus

This corpus, based on TimeBank 1.2, was created for the temporal relation task at SemEval-2007. Here, we make available the training and test data, as well as the evaluation data (test data plus temporal relations filled in). For more information, see the readme.html files in the downloads, the SemEval and TempEval websites, and the TempEval task paper (pdf, bib). The source data are copyrighted by the various content holders and can be used for academic purposes only.

TimeBank 1.1

In the beginning there was TimeBank 1.1, the old release notes are still available.