During the 1990's, the field of computational linguistics achieved several practical advances through the construction of annotated corpora for developing and testing various technologies. This document describes an initial set of guidelines for annotating time expressions with a canonicalized representation of the times they refer to. This work has been carried out in support of a number of research activities under DARPA's Translingual Information Detection, Extraction, and Summarization (TIDES) research research. The research which can most directly benefit includes question answering (e.g., answering "when" questions), event characterization and tracking, visualization of events on timelines, and production of biographical summaries. Other research related to information extraction may also benefit.
This is a draft document, which is expected to undergo considerable revision based on feedback from other researchers, before it is officially published and before any large-scale annotation efforts are initiated. The guidelines specify more details of semantic representation than the TIMEX recognition tasks used in recent DARPA-sponsored evaluations (MUC7 1998), but are similar in that they treat the temporal expressions as stand-alone targets for annotation/extraction. These guidelines are intended to support a variety of downstream applications in the performance of some useful tasks; they are not intended to represent all the varieties of temporal information conveyed in natural language communication (the latter is a hopelessly ambitious goal, in our view). The guidelines are aimed at two sets of users:
1. Human annotators about to embark on the annotation of temporal expressions in order to construct corpora consisting of temporally annotated data for use by the NL community.
2. System developers who are building tagging programs to extract temporal information from documents.