Brandeis University, version 2, April 2006.
The TimeBank Corpus contains 183 news articles that have been annotated with temporal information, adding events, times and temporal links between events and times. The annotation follows the TimeML 1.2.1 specification. This file includes a brief discussion of TimeML as well as a description of how TimeBank was created.
The most recent information on TimeML is always available at www.timeml.org.
TimeML aims to capture and represent temporal information. This is accomplished using four primary tag types: TIMEX3 for temporal expressions, EVENT for temporal events, SIGNAL for temporal signals, and LINK for representing relationships. For a detailed description of TimeML, see the TimeML 1.2.1 Specification and Guidelines. Here, we give a summary of each tag.
TIMEX3 — This tag is used to capture dates, times, durations, and sets of dates and times. All TIMEX3 tags include a type and a value along with some other possible attributes. The value is given according to the ISO 8601 standard. The TIMEX3 tag allows specification of a tempral anchor. This facilitates the use of temporal functions to calculate the value of an underspecified temporal expression. For example, an article might include a document creation time such as "January 3, 2006". Later in the article, the temporal expression "today" may occur. By anchoring the TIMEX3 for "today" to the document creation time, we can determine the exact value of the TIMEX3.
EVENT — The EVENT tag is used to annotate those
elements in a text that mark the semantic events described by it. Any
event that can be temporally anchored or ordered is captured with this
tag. An EVENT includes a class attribute with values such
SIGNAL — The SIGNAL tag is used to annotate temporal function words such as "after", "during", and "when". These signals are then used in the representation of a temporal relationship.
The following three tags are link tags. They capture temporal, subordination, and aspectual relationships found in the text. These tags do not consume any actual text, but they do relate the three tag types above to each other.
TLINK — Temporal links are represented with a TLINK
tag. A TLINK can temporally relate two temporal expressions, two
event instances, or a temporal expression and an event instance.
Along with an identification marker for each of these two elements, a
relation type is given such as
SLINK — This tag is used to capture subordination
relationships that involve event modality, evidentiality, and
factuality. An SLINK includes an event instance ID for the
subordinating event and an event instance ID for the subordinated
event. Possible relation types for SLINK
ALINK — An aspectual connection between two event
instances is represented with ALINK. As with SLINK, this tag includes
two event instance IDs, one that introduces the ALINK and one that is
the event argument to that event. The introducing event has the
EVENT 7935 MAKEINSTANCE 7940 TIMEX3 1414 SIGNAL 688 ALINK 265 SLINK 2932 TLINK 6418 Total 27592
The TimeBank sources come from a variety of news reports. Specifically, articles come from the Automatic Content Extraction (ACE) program and PropBank (TreeBank2) texts. Those coming from ACE come from transcribed broadcast news from the following sources: ABC, CNN, PRI, and VOA, and newswire from AP and NYT. PropBank supplied articles from the Wall Street Journal.
These particular sources were chosen because they offered text rich with temporal information both in the form of temporal expressions and events that could be anchored or ordered in time.
The annotation of TimeBank has been a multi-step process. In the first phase, five annotators with varying backgrounds in linguistics took part. In addition to their annotation work, each participated in the development of the TimeML annotation scheme. This phase of the annotation took place during several annotation-intensive weeks. Throughout this time, the annotators met to discuss their work so that they could achieve a high level of annotator agreement.
The annotation of each document during this phase of the effort began with a preprocessing step. This involved the tagging of some events and signals. When possible, preprocessing also attempted to supply the class, tense, and aspect of the tagged events. After preprocessing, one of the five annotators completed the annotation of the document including a check of the output from the preprocessing step.
During this phase of the annotation effort, TimeML was still under development. Subsequent phases of annotation involved updating this early version of TimeBank to the current TimeML specification, version 1.2.1. This has been done automatically where possible and manually where needed.
The most recent phase of the TimeBank development involved four annotators who have all previously participated in some TimeML annotation and are intimately familiar with the latest specification. Each annotator focused on a specific set of TimeML tags and used the TimeBank browser to check whether the annotation of his or her tags is accurate and complete. This current release of TimeBank reflects this work.
A subset of ten documents from TimeBank 1.2 were independently annotated by two experienced annotators. To measure the agreement on tag extents, the average of precision and recall were computed with one annotator's data as the key and the other's as the response. The tag extent for link tags was defined as the combined tag extents of the two linked events and times.
agreement exact match partial match TIMEX3 0.83 0.96 SIGNAL 0.77 0.77 EVENT 0.78 0.81 ALINK 0.81 - SLINK 0.85 - TLINK 0.55 -
The low inter-annotator agreement score for TLINKs is dues to the large number of event-pairs that can be selected for specifying temporal links.
For agreement on features, both average P&R and the more traditional Kappa score were used:
agreement P&R Kappa TIMEX3.type 1.00 1.00 TIMEX3.value 0.90 0.89 TIMEX3.temporalFunction 0.95 0.87 TIMEX3.mod 0.95 0.73 EVENT.class 0.77 0.67 EVENT.pos 0.99 0.96 EVENT.tense 0.96 0.93 EVENT.aspect 1.00 1.00 EVENT.polarity 1.00 1.00 EVENT.modality 1.00 1.00 ALINK.relType 0.80 0.63 SLINK.relType 0.98 0.96 TLINK.relType 0.77 0.71
It should be noted that some of the figures in the above tables are rather insignificant due to the size of the sub corpus over which the statistics were obtained. Most notably, the inter-annotator agreement numbers for ALINKs as well as the polarity feature for events are not reliable.
A readme.txt file as well as a more prozaic version in timebank.html (this file). Also includes the TimeML specifications and guidelines.
Contains the XML schema and DTD as well as the Perl script and Java class used for validation of TimeBank.
Contains the 183 annotated documents with TimeML tags only.
Contains the 183 annotated documents with TimeML tags and some extra tags like sentence markers, document-level tags and some entity tags.
Luc Belanger, Bran Boguraev, Jose Castaño, David Day, Jenna Fernandes, Lisa Ferro, Robert Gaizauskas, Linda van Guilder, Patrick Hanks, Jerry Hobbs, Seo-Hyun Im, Robert Ingria, Graham Katz, Robert Knippen, Innokenti Kremerman, Marcia Lazo, Jessica Littman, Inderjeet Mani, James Pustejovsky, Dragomir Radev, Anna Rumshisky, Antonio Sanfilippo, Roser Saurí, Andrew See, Andrea Setzer, Oleg Sofryguine, Amber Stubbs, Beth Sundheim, Svetlana Symonenko, Marc Verhagen, Harris Wu.
We would like to express our sincere thanks to John Prange and Penny S. Lehtola from ARDA, without whose funding, TimeML and TimeBank would not have been possible. We would also like to thank Mark Maybury of MITRE, for making the facilities of the NRRC at MITRE Bedford available to us during the initial development of TimeBank, throughout both the TERQAS and TANGO workshops.
The annotations in this data collection are copyrighted by Brandeis University. User acknowledges and agrees that: (i) as between User and Brandeis University, Brandeis University owns all the right, title and interest in the Annotated Content, unless expressly stated otherwise; (ii) nothing in this Agreement shall confer in User any right of ownership in the Annotated Content; and (iii) User is granted a non-exclusive, royalty free, worldwide license (with no right to sublicense) to use the Annotated Content solely for academic and research purposes. This Agreement is governed by the law of the Commonwealth of Massachusetts and User agrees to submit to the exclusive jurisdiction of the Massachusetts courts.
Note: The textual news documents annotated in this corpus have been collected from a wide range of sources and are not copyrighted by Brandeis University. The user acknowledges that the use of these news documents is restricted to research and/or academic purposes only.