TimeBank 1.2 Documentation

James Pustejovsky, Jessica Littman, Roser Saurí, Marc Verhagen
Brandeis University, version 2, April 2006.

Contents

  1. Introduction
  2. Overview of TimeML
  3. TimeBank Statistics
  4. TimeBank Sources
  5. Annotation Effort
  6. Inter-Annotator Agreement
  7. Validation
  8. Contents of the LDC Distribution
  9. Future Work
  10. Contributors
  11. Copyright Notice

Introduction

The TimeBank Corpus contains 183 news articles that have been annotated with temporal information, adding events, times and temporal links between events and times. The annotation follows the TimeML 1.2.1 specification. This file includes a brief discussion of TimeML as well as a description of how TimeBank was created.

The most recent information on TimeML is always available at www.timeml.org.

Overview of TimeML

TimeML aims to capture and represent temporal information. This is accomplished using four primary tag types: TIMEX3 for temporal expressions, EVENT for temporal events, SIGNAL for temporal signals, and LINK for representing relationships. For a detailed description of TimeML, see the TimeML 1.2.1 Specification and Guidelines. Here, we give a summary of each tag.

TIMEX3 — This tag is used to capture dates, times, durations, and sets of dates and times. All TIMEX3 tags include a type and a value along with some other possible attributes. The value is given according to the ISO 8601 standard. The TIMEX3 tag allows specification of a tempral anchor. This facilitates the use of temporal functions to calculate the value of an underspecified temporal expression. For example, an article might include a document creation time such as "January 3, 2006". Later in the article, the temporal expression "today" may occur. By anchoring the TIMEX3 for "today" to the document creation time, we can determine the exact value of the TIMEX3.

EVENT — The EVENT tag is used to annotate those elements in a text that mark the semantic events described by it. Any event that can be temporally anchored or ordered is captured with this tag. An EVENT includes a class attribute with values such as occurrence, state, or reporting. The class of an EVENT may indicate what relationships the event participates in. In addition to the EVENT tag, events are also annotated with one or more MAKEINSTANCE tags that include information about a particular instance of the event. This includes part of speech, tense, aspect, modality, and polarity. When an event participates in a relationship, it is actually the event instance that is referenced. This is to allow for statements such as "John taught on Monday but not on Tuesday." Here, there are actually two instances of the teaching-event: one that has a positive polarity and one that is negative. Further, each instance participates in its own temporal relationship with respect to "Monday" and "Tuesday".

SIGNAL — The SIGNAL tag is used to annotate temporal function words such as "after", "during", and "when". These signals are then used in the representation of a temporal relationship.

The following three tags are link tags. They capture temporal, subordination, and aspectual relationships found in the text. These tags do not consume any actual text, but they do relate the three tag types above to each other.

TLINK — Temporal links are represented with a TLINK tag. A TLINK can temporally relate two temporal expressions, two event instances, or a temporal expression and an event instance. Along with an identification marker for each of these two elements, a relation type is given such as before, includes, or ended_by. When a signal is present that helps to define the relationship, an ID for the SIGNAL is given as well.

SLINK — This tag is used to capture subordination relationships that involve event modality, evidentiality, and factuality. An SLINK includes an event instance ID for the subordinating event and an event instance ID for the subordinated event. Possible relation types for SLINK include modal, evidential, and factive. An SLINK will typically not include a signal ID unless it has the relation type conditional. Three specific EVENT classes interact with SLINK: reporting, i_state, and i_action.

ALINK — An aspectual connection between two event instances is represented with ALINK. As with SLINK, this tag includes two event instance IDs, one that introduces the ALINK and one that is the event argument to that event. The introducing event has the class aspectual. Some possible relation types for ALINK are initiates, terminates, and continues.

TimeBank Statistics

TimeBank 1.2 contains 183 articles with just over 61,000 non-punctuation tokens. The count for each TimeML tag is listed below:
EVENT 7935
MAKEINSTANCE 7940
TIMEX3 1414
SIGNAL 688
ALINK 265
SLINK 2932
TLINK 6418

Total 27592

TimeBank Sources

The TimeBank sources come from a variety of news reports. Specifically, articles come from the Automatic Content Extraction (ACE) program and PropBank (TreeBank2) texts. Those coming from ACE come from transcribed broadcast news from the following sources: ABC, CNN, PRI, and VOA, and newswire from AP and NYT. PropBank supplied articles from the Wall Street Journal.

These particular sources were chosen because they offered text rich with temporal information both in the form of temporal expressions and events that could be anchored or ordered in time.

Annotation Effort

The annotation of TimeBank has been a multi-step process. In the first phase, five annotators with varying backgrounds in linguistics took part. In addition to their annotation work, each participated in the development of the TimeML annotation scheme. This phase of the annotation took place during several annotation-intensive weeks. Throughout this time, the annotators met to discuss their work so that they could achieve a high level of annotator agreement.

The annotation of each document during this phase of the effort began with a preprocessing step. This involved the tagging of some events and signals. When possible, preprocessing also attempted to supply the class, tense, and aspect of the tagged events. After preprocessing, one of the five annotators completed the annotation of the document including a check of the output from the preprocessing step.

During this phase of the annotation effort, TimeML was still under development. Subsequent phases of annotation involved updating this early version of TimeBank to the current TimeML specification, version 1.2.1. This has been done automatically where possible and manually where needed.

The most recent phase of the TimeBank development involved four annotators who have all previously participated in some TimeML annotation and are intimately familiar with the latest specification. Each annotator focused on a specific set of TimeML tags and used the TimeBank browser to check whether the annotation of his or her tags is accurate and complete. This current release of TimeBank reflects this work.

Inter-Annotator Agreement

A subset of ten documents from TimeBank 1.2 were independently annotated by two experienced annotators. To measure the agreement on tag extents, the average of precision and recall were computed with one annotator's data as the key and the other's as the response. The tag extent for link tags was defined as the combined tag extents of the two linked events and times.

TimeML
Tag
agreement
exact match partial match
TIMEX3 0.83 0.96
SIGNAL 0.77 0.77
EVENT 0.78 0.81
ALINK 0.81 -
SLINK 0.85 -
TLINK 0.55 -

The low inter-annotator agreement score for TLINKs is dues to the large number of event-pairs that can be selected for specifying temporal links.

For agreement on features, both average P&R and the more traditional Kappa score were used:

TimeML Tag
and attribute
agreement
 P&R Kappa
TIMEX3.type 1.00 1.00
TIMEX3.value 0.90 0.89
TIMEX3.temporalFunction 0.95 0.87
TIMEX3.mod 0.95 0.73
EVENT.class 0.77 0.67
EVENT.pos 0.99 0.96
EVENT.tense 0.96 0.93
EVENT.aspect 1.00 1.00
EVENT.polarity 1.00 1.00
EVENT.modality 1.00 1.00
ALINK.relType 0.80 0.63
SLINK.relType 0.98 0.96
TLINK.relType 0.77 0.71

It should be noted that some of the figures in the above tables are rather insignificant due to the size of the sub corpus over which the statistics were obtained. Most notably, the inter-annotator agreement numbers for ALINKs as well as the polarity feature for events are not reliable.

Validation

TimeBank 1.2 has been validated against version 1.2.1 of the Document Type Definition and XML Schema. Validity checking against the DTD was performed using the Perl XML::Checker::Parser module, available as part of XML-Checker-0.13 from www.cpan.org. The XML schema were applied using the Xerces Java Parser, version 1.4.4, available at xerces.apache.org. Please refer to the readme file in the validation directory for more information.

Contents of the LDC Distribution

timebank_1.2/doc/

A readme.txt file as well as a more prozaic version in timebank.html (this file). Also includes the TimeML specifications and guidelines.

timebank_1.2/dtd/

Contains the XML schema and DTD as well as the Perl script and Java class used for validation of TimeBank.

timebank_1.2/data/timeml/

Contains the 183 annotated documents with TimeML tags only.

timebank_1.2/data/extra/

Contains the 183 annotated documents with TimeML tags and some extra tags like sentence markers, document-level tags and some entity tags.

Future Work

Timebank is a new resource that is under constant revision. In the near future, we will explore the following issues: Please contact us at timebank@timeml.org if you have any remarks on the state or quality of TimeBank.

Contributors to TimeBank

The following is a list of people who have contributed to TimeBank through annotation, construction of authoring and validation tools, participation in TimeML discussions, or otherwise:
Luc Belanger, Bran Boguraev, Jose Castaño, David Day, Jenna Fernandes, Lisa Ferro, Robert Gaizauskas, Linda van Guilder, Patrick Hanks, Jerry Hobbs, Seo-Hyun Im, Robert Ingria, Graham Katz, Robert Knippen, Innokenti Kremerman, Marcia Lazo, Jessica Littman, Inderjeet Mani, James Pustejovsky, Dragomir Radev, Anna Rumshisky, Antonio Sanfilippo, Roser Saurí, Andrew See, Andrea Setzer, Oleg Sofryguine, Amber Stubbs, Beth Sundheim, Svetlana Symonenko, Marc Verhagen, Harris Wu.

We would like to express our sincere thanks to John Prange and Penny S. Lehtola from ARDA, without whose funding, TimeML and TimeBank would not have been possible. We would also like to thank Mark Maybury of MITRE, for making the facilities of the NRRC at MITRE Bedford available to us during the initial development of TimeBank, throughout both the TERQAS and TANGO workshops.

Copyright Notice

The annotations in this data collection are copyrighted by Brandeis University. User acknowledges and agrees that: (i) as between User and Brandeis University, Brandeis University owns all the right, title and interest in the Annotated Content, unless expressly stated otherwise; (ii) nothing in this Agreement shall confer in User any right of ownership in the Annotated Content; and (iii) User is granted a non-exclusive, royalty free, worldwide license (with no right to sublicense) to use the Annotated Content solely for academic and research purposes. This Agreement is governed by the law of the Commonwealth of Massachusetts and User agrees to submit to the exclusive jurisdiction of the Massachusetts courts.

Note: The textual news documents annotated in this corpus have been collected from a wide range of sources and are not copyrighted by Brandeis University. The user acknowledges that the use of these news documents is restricted to research and/or academic purposes only.