Annotation Guideline Version: 0.1
Release Date: June 4, 2002
Authors: Roser Saurí, James Pustejovsky, Bob Ingria
TERQAS Annotation Working Group Members: Lisa Ferro, Marcia Lazo, David Day, Patrick Hanks, Marc Verhagen, Roser Saurí, José Castaño, Bob Ingria, James Pustejovsky.
This document describes the initial annotation guidelines for marking up text according to the TimeML language. It puts together all the modifications and additions discussed so far and gives us the tag specifications that follow for a first pass at TimeML.
For the sake of convenience, I&P(02) will be used to refer to Ingria and Pustejovsky (2002), and TIDES(02), to Ferro et al. (2002) throughout the whole document.
We will consider events a cover term for situations that happen or occur. Events can be punctual (1-2) or last for a period of time (3-4):
We will consider also as events those predicates describing states or circumstances in which something obtains or holds true. However, they will be annotated only in cases where they are identifiably changed over the course of the document being marked up. For instance, in the following example, in the expression the Aeroflot Airbus the relationship indicating that the Airbus is run and operated by Aeroflot is not a State in the desired sense. Rather, because it is persistent throughout the event line of the document, we factor it out and it is not marked up. On the other hand, properties that are known to change during the events represented/reported in an article will be marked as States, as illustrated below. So, assume for now that events have only an id attribute:
All 75 people <EVENT stid=1> on board </EVENT> the Aeroflot Airbus <EVENT eid=5> died </EVENT>
Events are generally expressed by means of tensed and untensed verbs (1 and 2), nominalizations (3), adjectives (4), predicative clauses (5), or prepositional phrases (6):
A fresh flow of lava, gas and debris <EVENT eid=1> erupted </EVENT> there Saturday.However, formally complex events may be sequentally discontinuous in some contexts:
Israel has been scrambling to buy more masks abroad.
Additional distribution centers would be set up next week.
The young industry's rapid growth also is attracting regulators eager to police its many facets.
Several pro-Iraq demonstrations have taken place in the last week.This proposal takes into consideration the relevance of both verbal and nominal heads wrt the different kind of eventive information they convey. The two tagged events will be related as IDENTICAL by the 'relType' attribute in the LINK tag (see section 2.5).
They will definitely take into consideration our readiness.
There is no reason why we would not be prepared.
If, in spite of everything, we will not be ready, we will ask the United States to delay the operation.
All 75 people on board the Aeroflot Airbus died.
Non-optional attribute. Each event belongs to one of the following classes:
Israel has been scrambling to buy more masks abroad, after a shortage of several hundred thousand gas masks, including those for young children, was discovered.
"There is no reason why we would not be prepared," Mordechai told the Yediot Ahronot daily.
No injuries were reported over the weekend.
The agencies fear they will be unable to crack those codes to eavesdrop on spies and crooks.
Punongbayan said that the 4,795-foot-high volcano was spewing gases up to 1,800 degrees.
No injuries were reported over the weekend.
A couple of examples:
The volcano began showing signs of activity in April for the first time in 600 years,...
all non-essential personnel should begin evacuating the sprawling base.
Here, a member of a closed class of predicates is able to
select a verbal or nominal complement as an argument and mark
that event with the function (designation) associated with one
of the facets above. See the section on the LINK tag for the
annotation of the relation between the 2 events.
Some examples are: believe, think, expect, suspect, fear, want, hope, appear, seem etc.
"We're expecting a major eruption," he said in a telephone interview early today.We consider as Intensional Events also those "world-creating" predicates that can be regarded as states. Therefore they will not be classified as STATES:
The agencies fear they will be unable to crack those codes to eavesdrop on spies and crooks.
It is conceivable that a larger eruption will take place in few hours.See section 2.5 about how to relate the Intensional Event and the event that it introduces as its complement.
He said he was sure that a larger eruption would happen.
Companies such as Microsoft or a combined worldcom MCI are trying to monopolize Internet access.
Israel will ask the United States to delay a military strike against Iraq.
Israel has been scrambling to buy more masks abroad.
Germany has agreed to lend/(irrealis) Israel 180,000 protective kits against chemical and biological weapons, and Switzerland offered to lend/(irrealis) Israel another 25,000 masks.
They were ordered to take/(irrealis) along important papers.
The Defense Ministry said 16 planes have landed so far with protective equipment against biological and chemical warfare.
Mordechai said all the gas masks from abroad would arrive soon and be distributed to the public, adding that additional distribution centers would be set up next week.
Two moderate eruptions shortly before 3 p.m. Sunday appeared to signal a larger explosion.
Non-optional attribute. Given that the verbal tense is a valuable information in determining the temporal ordering of a related set of events, we will annotate it wherever it occurs. Indeed, it doesn't apply to nominalizations and non-finite clauses. In this case, the appropriate value will be 'NONE' (see below). The possible values for 'Tense' are:
No injuries were reported over the weekend.
Villagers from a 12-mile radius of the mountain fled the area on foot
The young industry's rapid growth also is attracting regulators eager to police its many facets.
The agencies fear they will be unable to crack those codes to eavesdrop on spies and crooks.
"Anything along its path will be destroyed."
The agencies fear they will be unable to crack those codes to eavesdrop on spies and crooks.
Germany has agreed to lend Israel 180,000 protective kits against chemical and biological weapons, and Switzerland offered to lend Israel another 25,000 masks.
They were ordered to take along important papers.
Note here that the irrealis character of the event depends upon the
situation being described and not on the formal features of
the verbal predicate.
The evacuation was to take four hours, he said.
Officials ordered local Americans to pack bags and rehearse evacuation procedures.
Optional attribute. Whenever possible, we will mark whether the aspect of the verb is perfective or progressive.
Philippines officials earlier had ordered the evacuation of more than 11,000 people.
Anything along its path will be destroyed.
The volcano began showing signs of activity in April for the first time in 600 years,...
The young industry's rapid growth also is attracting regulators eager to police its many facets.
Non-optional attribute. Two values are possible here:
Villagers from a 12-mile radius of the mountain fled the area on foot.
All 75 people on board the Aeroflot Airbus died.
"There is no reason why we would not be prepared," Mordechai told the Yediot Ahronot daily.
No injuries were reported over the weekend.
Palestinian police prevented a planned pro-Iraq rally by the Palestinian Professionals' Union.
attributes ::= eid class tense [aspect] polarity eid ::= <integer> class ::= 'OCCURRENCE' |'STATE' | 'REPORTING' | 'ASPECTUAL' | 'INTENDING' tense ::= 'PAST' | 'PRESENT' | 'FUTURE' | 'IRREALIS' | 'NONE' aspect ::= 'PROGRESSIVE' | 'PERFECTIVE' polarity ::= 'POSITIVE' | 'NEGATIVE'
The young industry's rapid <EVENT eid=1 class=OCCURRENCE tense=NONE polarity=POSITIVE> growth </EVENT> also is <EVENT eid=2 class=OCCURRENCE tense=PRESENT aspect=PROGRESSIVE polarity=POSITIVE> attracting </EVENT> regulators <EVENT eid=4 class=STATE polarity=POSITIVE> eager </EVENT> to <EVENT eid=5 class=OCCURRENCE tense=IRREALIS polarity=POSITIVE> police </EVENT> its many facets.
Several pro-Iraq <EVENT eid=1 class=OCCURRENCE tense=NONE polarity=POSITIVE> demonstrations </EVENT> have <EVENT eid=2 class=OCCURRENCE tense=PAST aspect=PERFECTIVE polarity=POSITIVE> taken </EVENT> place in the last week.
(WARNING: This section is subject to revision. Most of the specifications given here require common agreement. The section has been based on TIDES(02) treatment of temporal expressions. However, a deeper study of that document, and the corresponding tunning of the current section, is still required.)
In order to be as much compliant as possible with TIDES TIMEX2 annotation, the TIMEX3 tag will be applied to TIMEX2 markable expressions. See TIDES(02) for detailed information about the particular expressions we intend to cover.
However, TimeML will differ from TIDES in the following issues (the examples given below are adapted from TIDES(02)):
See TIDES(02), appendix C, for a detailed list of these elements.
We want to separate these elements from the time expression in order to avoid potential problems with expressions like early last month or at least last week. TimeML marks last as a SIGNAL (see section 2.4). Considering then the TIMEX2 MOD markable element (early, at least) as part of the time expression would cause a fragmented sequence:
early <TIMEX3 eid=1> (part I) last <SIGNAL> month <TIMEX3 eid=1> (part II)That problem doesn't seem to apply to temporal expressions with quantifier modifiers. However, we keep them separated from the TIMEX3 markable string in order to facilitate a straightforward translation from TIDES TIMEX2 to TimeML TIMEX3.
We will annotate TIMEX2 MOD markable elements as SIGNALS (section 2.4 below).
Such treatment, together with the introduction of TEMPORAL FUNCTIONs
(section 2.6), guarantees a more compositional approach
and thus ensures a better process of reasoning.
Again, this treatment is fundamental in order to ensure the reasoning
capability of the system.
five days (after he came back)***the future (of our peoples)nearly four decades (of experience)months (of renewed hostility)a historic day (for the European enterprise)the second-best quarter (ever)
twelve o'clock midnightNote that in the second of December the article won't be annotated as part of the temporal expression. On the other hand, second should be included in it (and not treated as a SIGNAL) since it is interpretred differently from the use of second in the second week of July, where the ordinal is a modifier of week.
Friday evening
Tuesday the 18th
twelve o'clock January 3, 1984
the second of December
October of 1963
summer of 1964
eleven in the morning
I'm leaving on vacation two weeks from next Tuesday.TimeML will annotate each of these temporal-denoting strings as having two temporal expressions (two weeks and Tuesday, and three years and today), which are related by means of a FUNCTION tag. The mark-up for the first instance would be as follows:
A major earthquake struck Los Angeles three years ago today.
***
This year's summer was unusually hot.In this case, the tag relating the two temporal expressions (year and summer) is of type LINK:
***
I tutored an English student some Thursdays in 1998.
The concert is at 8:00 p.m. on Friday.
The concert is Friday at 8:00 p.m.
Some examples of TIMEX3 markable expressions are the following:
October 1, 1999
9 a.m. Friday, October 1, 1999
Fall 1999
ten minutes to three
now, nowadays, yesterday, currently, tomorrow, lately, etc.That is, tokens that occupy the entire value of VAL in TIDES guidelines (see TIDES(02), appendix B2), with the exception of the adjectives current, present, future, past, former, late, etc., which will be tagged as signals so that the appropriate temporal functions can be applied.
The annotation strategy that will be applied to them is exactly as detailed TIDES(02), p. 14.
At the present moment it is still pending a deep analysis of how much we want to benefit from TIDES annotation guidelines. All the specifications given above are provisional and require aproval.
Non-optional attribute. Each event belongs to one of the following classes:
the second of December
October of 1963
summer of 1964
Tuesday the 18th
November 1943
Fall 1998
This year's summer
two weeks from next Tuesday (in I'm leaving on vacation two weeks from next Tuesday)
ten minutes to three
five till eight
twenty after twelve
half past noon
eleven in the morning
twelve o'clock January 3, 1984
9 a.m. Friday, October 1, 1999
the morning of January 31
CalDate follows TIDES in using a modified version of the ISO 8601 time standard. The annotator should introduce the value assisted by the annotation tool.
If the calDate value is not given by the temporal expression (e.g., two days ago vs. October 1, 1999), the value will be computed by means of a TEMPORAL FUNCTION that will be designed also with the help of the annotation tool. This is deduced from the use of TEMPORAL FUNCTIONS in I&P(02).
attributes ::= tid type calDate
tid ::= <integer> type ::= 'DATE' | 'TIME' | 'COMPLEX' calDate ::= PCDATA
The DocCreationTime tag marks up the date of the article (DOA, in the STAG terminology). Since this is simply a temporal expression (TIMEX2 markable), it will be annotated as such. However, we will wrap it with the 'DocCreationTime' tag in order to preserve the distinguished status of this temporal expression as the primary temporal anchor of the article.
This is the only case in which we will use embedded tags: the date of the article has to be initially marked with the TIMEX3 tag, and then the whole structure wrapped up with the DocCreationTime tag:
<DocCreationTime> <TIMEX3 tid=1 type=DATE calDate=03271996> 03-27-96 </TIMEX3> </DocCreationTime>
DocCreationTime has no attributes.
the early 1960s, late last night, ...
the second Sunday after Christmas, last week, ...
Be it formally simple (e.g., before) or complex (no more than, end of, etc.), it will be annotated as a unit:
<SIGNAL signalID=1> no more than </SIGNAL> <TIMEX3> two weeks </TIMEX3>However, we should be careful in not including two different signals under the same tag. This is particularly relevant in order to obtain a consistent application of temporal functions. Therefore, the sequence before last week must not be tagged like this:
<SIGNAL sid=1> before last </SIGNAL> <TIMEX3 tid=1 type=DATE> week </TIMEX3>but like this:
<SIGNAL sid=1> before </SIGNAL> <SIGNAL sid=2> last </SIGNAL> <TIMEX3 tid=1 type=DATE> week </TIMEX3>
In that example, last is the signal that helps in the temoral anchoring of week, whereas before relates the 'calDate' value that should hold for (last) week with another timex or event (cf. We ordered the ticket before last week.)
(N.B. The way how the 'calDate' value for (last) week is obtained will be discussed in section 2.6 on temporal functions.)
By now SIGNAL has only one attribute: 'signalID'. It is automatically assigned by the annotation tool each time a SIGNAL is marked up.
attributes ::= sid sid ::= <integer>
attributes ::= (eventID | timeID) [signalID] (relatedToEvent | relatedToTime) relType magnitude eventID ::= <integer> timeID ::= <integer> signalID ::= <integer> relatedToEvent ::= <integer> relatedToTime ::= <integer> *relType ::= 'BEFORE' | 'AFTER' | 'INCLUDES' | 'IS_INCLUDED' | 'SIMULTANEOUS' | 'IAFTER' | 'IBEFORE' | 'ID' | 'INITIATES' | 'CULMINATES' | 'TERMINATES' | 'CONTINUES' magnitude ::= <integer>