The TimeBank 1.1 corpus is a set of 186 news report documents annotated with the 1.1 version of the TimeML standard for temporal annotation. This release should also include a copy of the TimeML schema version 1.1.
ProvenanceSome of the documents in TimeBank are from the DUC1 summarization evaluation that NIST ran in 2001 (files whose names start with "AP", "LA" "SJMN", and also of the "WSJ" files). The rest of the articles are from ACE corpora. The file names that start with "ABC", "CNN", "PRI", "VOA", "ea", and "ed" are broadcast news. Those that start with "APW" and "NYT" are newsire. All of these are included in LDC catalog item LDC2003T11. The other ACE corpus is WSJ. The documents are included in LDC catalog item LDC99T42. A number of TimeBank documents have been excluded from this release because they were annotated by 'naive' annotators and the results were not judged to be immediately usable.
Areas Under Revision in the 1.1 ReleaseThese documents were annotated during the creation of the TimeML standard and the Tango TimeML Graphical Organizer tool. They constitute both a test domain for development and a proof of concept. As such, they should be considered preliminary. The user should be advised that efforts are under way to revise these documents. In particular, the following aspects of the annotation are being reviewed:
Non-TimeML MarkupThe documents contain other kinds of xml markup, including document format and structure information, named entity recognition, sentence boundary information, and others. We have made no attempt to review this information.