TimeBank 1.1

The TimeBank 1.1 corpus is a set of 186 news report documents annotated with the 1.1 version of the TimeML standard for temporal annotation. This release should also include a copy of the TimeML schema version 1.1.

Provenance

Some of the documents in TimeBank are from the DUC1 summarization evaluation that NIST ran in 2001 (files whose names start with "AP", "LA" "SJMN", and also of the "WSJ" files). The rest of the articles are from ACE corpora. The file names that start with "ABC", "CNN", "PRI", "VOA", "ea", and "ed" are broadcast news. Those that start with "APW" and "NYT" are newsire. All of these are included in LDC catalog item LDC2003T11. The other ACE corpus is WSJ. The documents are included in LDC catalog item LDC99T42. A number of TimeBank documents have been excluded from this release because they were annotated by 'naive' annotators and the results were not judged to be immediately usable.

Areas Under Revision in the 1.1 Release

These documents were annotated during the creation of the TimeML standard and the Tango TimeML Graphical Organizer tool. They constitute both a test domain for development and a proof of concept. As such, they should be considered preliminary. The user should be advised that efforts are under way to revise these documents. In particular, the following aspects of the annotation are being reviewed:
  • incomplete temporal linking
    Temporal relations have been manually annotated between selected event pairs or event/time pairs only. A temporal closure algorithm will create links between many more pairs of events/times.
  • event classes
    Event classification is currently being improved based on multiple annotations by different annotators.
  • annotation of tense/aspect
    Incomplete tense/aspect information is available for many events.
  • incomplete subordinated linking
    Some conditional structures and purposive clauses may not have been annotated.

Non-TimeML Markup

The documents contain other kinds of xml markup, including document format and structure information, named entity recognition, sentence boundary information, and others. We have made no attempt to review this information.