TimeML Historical Specification

TimeML Version: 0.2

Release Date: June 4, 2002

Authors: Bob Ingria and James Pustejovsky

TERQAS TimeML Working Group Members: Branimir Boguraev, Michael Bukatin, Jose Castano, John Frank, Rob Gaizauskas, Bob Ingria, Graham Katz, Andy Latto, Inderjeet Mani, James Pustejovsky, Erik Rauch, Antonio Sanfilippo, Roser Sauri.

1.0 Introduction

This document describes the initial specification of the markup language for temporal and event expressions in text being developed by the TERQAS TimeML Working Group. The group started with two major design goals: (1) to use the core of Andrea Setzer's annotation framework (Setzer, 2001), which was christened STAG (Sheffield Temporal Annotation Guidelines); and (2) to remain, as much as possible, compliant with the TIDES TIMEX2 annotation effort (Ferro, et al, 2002). The full set of changes made from STAG and the TIDES TIMEX2 guidelines is as follows.

  1. Introduce a LINK tag: an object that links events/times to events/times but consumes no input text.
  2. Introduce a STATE tag: annotate only states that are updated in the context of the narrative being tagged.
    Any state persistent throughout the entire article would not be tagged as a state.
  3. Enrich temporal relations: add immediately-after (IAFTER) and immediately-before (IBEFORE).
  4. Introduce scale as a relation attribute: we need to convert preexisting Timex data into the TimeML standard.
  5. Introduce event identity (ID).
  6. Add NONE as a value for the tense attribute.
  7. Add temporal functions for doing temporal math on expressions such as "last week".
    Track this as an enrichment over the TIDES guidelines.
  8. To distinguish the fact that the TIMEX tag in TimeML is different from both Setzer's TIMEX tag and the TIMEX2 tag, rename it TIMEX3.
  9. Introduce a DURATION tag, dinstinct from TIMEX3, for expressions such as "two months".
  10. Replace Setzer's DOA (Date of Article) tag with TIMEX3, since everything tagged DOA is a TIMEX3 expression. To distinguish this TIMEX3 expression from all others, wrap it with a special tag, DocCreationTime.

The working group also discussed the following features, although in less detail. None are currently features of TimeML.

  1. Possibly identify "Event Clusters" or "time frames" in a document. For example, often the events and situations in a document will cluster around a pivot or central occurrence. Other pivots might include the actual reporting event itself. Hence, there are different layers of event clusters. This would be useful for grouping related events in a narrative, temporal segmentation of the narrative, by reducing the number of temporal relations that need to be annotated. Also, this may prove useful to explore relative to the Text Segmented Closure algorithm being developed.
  2. Brief discussion of negation and modality. One suggestion is to use a polarity attribute on negative propositional content:
    1. The plane did not crash.
    2. No survivors were found.
  3. Enrich the Event Typology to improve temporal inference.
    This is related to the next point.
  4. Add hooks to the event ontology for event entailment operations.
  5. Include event and time closure operations as part of TimeML.
  6. It was suggested that the head verb should not be annotated as an EVENT but rather as a signal to the event.
  7. Along the same lines, it was also suggested that aspect should be annotated as a signal, rather than as an attribute of EVENT.
    In combination with the preceding point, this would mean that the <LINK> tag (see below) would contain all the semantic information in the annotation.
  8. Introduce init and cul attributes to events, or either reify init and cul as events, to handle aspectual events:
    1. The party will begin at noon.
    2. The man began the lecture at noon.

In the remainder of the document, we outline a BNF for TimeML, as an initial step towards specifying a complete XML schema for the language, in addition to a DTD. The reason why an XML schema is preferable to a DTD is that an XML schema provides an initial richer set of data types for constraining the value of attributes, and also provides a mechanism for adding user-defined types.

In order to record the decisions that were made in creating TimeML, and to justify the areas in which it diverges from STAG and TIDES, the structure of this documents recapitulates the changes that were made at the various meetings of the TimeML working group and discussions with TIDES annotators. Section 2 presents the additions that were made during the first meeting of the TimeML working group (March 11-15, 2002); section 3 presents the additions that were made during the second meeting (April 22-26, 2002); and section 4 presents changes suggested during an "annotation fest" incorporating members of the TimeML working group, corpus WG, and TIDES annotators (May 8 and May 15, 2002) in which the participants applied TimeML annotation to various texts, as a group.

2.0 From STAG to TimeML---TimeML Phase 1

2.1 Tags and Attributes for STAG

This section presents a BNF for STAG, the temporal annotation language presented in Andrea Setzer's thesis (Setzer, 2001). Consideration of the details of this BNF, in conjunction with problems raised in trying to apply it to actual texts, resulted in several changes and extensions to Setzer's original framework. Detailing these issues will help justify the details of this initial pass at TimeML.

A note on attribute values:
All attribute values are marked as 'potential values' in Setzer's thesis; those marked with * may need to have a larger, but still closed, set specified. Those not so marked may be sufficient as is.

<EVENT>

attributes ::= eid class [argEvent] [tense] [aspect] [([signalID] relatedToEvent eventRelType) | ([signalID] relatedToTime timeRelType)]
//* N.B. argEvent is dependent on class='PERCEPTION', 'REPORTING', or 'ASPECTUAL'
eid ::= <integer>
*class ::= 'OCCURRENCE' | 'PERCEPTION' | 'REPORTING' | 'ASPECTUAL'
argEvent ::= <integer>
tense ::= 'PAST' | 'PRESENT' | 'FUTURE'
aspect ::= 'PROGRESSIVE' | 'PERFECTIVE'
signalID ::= <integer>
relatedToEvent ::= <integer>
*eventRelType ::= 'BEFORE' | 'AFTER' | 'INCLUDES' | 'IS_INCLUDED' | 'SIMULTANEOUS'
relatedToTime ::= <integer>
*timeRelType ::= 'BEFORE' | 'AFTER' | 'INCLUDES' | 'IS_INCLUDED' | 'SIMULTANEOUS'

<TIMEX>

attributes ::= tid type calDate [(eid signalID relType)]
//* calDate is limited to [[DD]MM]YYYY | ('SPR'|'SUM'|'AUT'|'WIN')YYYY
//* A standard SGML or XML DTD cannot represent this, but an XML schema can
tid ::= <integer>
*type ::= 'DATE' | 'TIME' | 'COMPLEX'
calDate ::= PCDATA
eid ::= <integer>
signalID ::= <integer>
*relType ::= 'BEFORE' | 'AFTER' | 'INCLUDES' | 'IS_INCLUDED' | 'SIMULTANEOUS'

<SIGNAL>

attributes ::= sid
sid ::= <integer>

<DOA>

'DOA' stands for 'Date of article'.

No attributes

2.2 Comments and Extensions

One thing that is striking in looking at this BNF is this fragment of the attribute structure of EVENT:

[([signalID] relatedToEvent eventRelType) | ([signalID] relatedToTime timeRelType)]

In each case, we are dealing not with three unrelated attributes, but with three attributes that only make sense as a unit. The same triad also appears in the attribute structure of TIMEX:

[(eid signalID relType)]

Moreover, as the specification of the values for the eventRelType and timeRelType attributes of EVENT and the relType attribute of TIMEX indicates, we are really dealing with one property, whose values are specified three times. This is forced in the case of eventRelType and timeRelType for EVENT by virtue of the fact that only the name of the attribute can link it to relatedToEvent or relatedToTime, respectively. And, of course, since relType is defined on TIMEX, not EVENT, it must repeat the specification of permissible values.

All these considerations suggest that these triplets of attributes should be factored out into the form of a new abstract tag (i.e. one which consumes no input text). This would formally express the fact that these attributes are linked, allow eventRelType, timeRelType and relType to be collapsed into a single attribute, and allow the specification of the possible values of this single attribute to be stated only once.

[Note: Of course in BNF (or in an XML DTD) it would be possible to specify an abstract element as the value of eventRelType, timeRelType, and relType and thus state their possible values only once, but we would still be left with the fact that the inherent relation between a signalID, relatedID, and relType would be unexpressed in the STAG annotation language.]

For these reasons, we remove the cited triplets from the definition of EVENT and TIMEX and introduce the tag:

<LINK>

attributes ::= (eventID | timeID) [signalID] (relatedToEvent | relatedToTime) relType
eventID ::= <integer>
timeID ::= <integer>
signalID ::= <integer>
relatedToEvent ::= <integer>
relatedToTime ::= <integer>
*relType ::= 'BEFORE' | 'AFTER' | 'INCLUDES' | 'IS_INCLUDED' | 'SIMULTANEOUS'

eventID and timeID are used to anchor the link to an EVENT or TIMEX (the element that would have contained the [signalID] (relatedToEvent | relatedToTime) relType triple before it was factored out into LINK). Note that factoring out this triplet also entails that the decision on where to record this information is now no longer arbitrary. Previously, the information could be recorded on either of two related events, but there was no principle to decide which event should contain this information.

2.2.1 Document Annotation in an Extension of STAG

In addition to purely formal considerations of the geometry of STAG (essentially, refactoring considerations, in the sense of Fowler (1999) and many others), the TimeML working group also found empirical considerations motivating addition of the LINK tag. The original STAG framework had been designed with the presupposition that any given EVENT would be related to at most one other EVENT or other indexed element. Attempts to annotate various newswire articles showed that this assumption was false, and that a single EVENT could be related to more than one other indexed element. Here is one such example:

FAMILIES SUE OVER AREOFLOT CRASH DEATHS

   The Russian airline Aeroflot has been
<EVENT eid=1 relatedToTime=1 timeRelType=BEFORE tense=PRESENT aspect=PERFECTIVE class=OCCURRENCE>
hit
</EVENT>
with a writ for loss and damages,
<EVENT eid=2 tense=NONE aspect=PERFECTIVE relatedToEvent=1 eventRelType=BEFORE class=OCCURRENCE>
filed
</EVENT>
in Hong Kong by the families of seven passengers
<EVENT eid=3 tense=NONE aspect=PERFECTIVE relatedToEvent=2 eventRelType=BEFORE
class=OCCURRENCE relatedToEvent2=4 eventRel2Type=IS_INCLUDED signal2=1>
killed
</EVENT>
<SIGNAL sid=1>
in
</SIGNAL>
an air
<EVENT eid=4 class=OCCURRENCE>
crash
</EVENT>.

   All 75 people
<STATE stid=1 relatedToEvent=5 eventRelType=INCLUDES>
on board
</STATE>
the Aeroflot Airbus
<EVENT eid=5 tense=PAST aspect=PERFECTIVE relatedToEvent=6 eventRelType=IAFTER signal=2>
died
</EVENT>
<SIGNAL sid=2>
when
</SIGNAL>
it
<EVENT eid=6 tense=PAST aspect=PERFECTIVE relatedToTime=2 timeRelType=IS_INCLUDED relatedToEvent=4 eventRelType=ID>
ploughed
</EVENT>
into a Siberian mountain
<SIGNAL sid=3>
in
</SIGNAL>
<TIMEX tid=2 type=DATE calDate=031994>
March 1994
</TIMEX>.

...

<DOA tid=1>
03-27-96
</DOA>

There are several notable features of this annotation. First, notice this <EVENT> element:

<EVENT eid=3 tense=NONE aspect=PERFECTIVE relatedToEvent=2 eventRelType=BEFORE
class=OCCURRENCE relatedToEvent2=4 eventRel2Type=IS_INCLUDED signal2=1>
killed
</EVENT>

which features the addition of the ad hoc attributes relatedToEvent2 and eventRel2Type as an attempt to allow multiple related events, along with the new attribute signal2, to link the signal to this related event and no other. Clearly this solution is fragile, in that it requires either a cut-off of related events at some arbitrary number or else an open-ended set of triplets of the form:

signalN relatedToEventN and eventRelNType

Note that the LINK tag introduced above solves both the problem of the potentially unbounded number of related events and that of relating a particular signal to a given related EVENT (or other indexed element).

2.2.2 Document Annotation with LINK

Given the existence of the LINK tag, we can rewrite the above annotation as follows:

FAMILIES SUE OVER AREOFLOT CRASH DEATHS

   The Russian airline Aeroflot has been
<EVENT eid=1 tense=PRESENT aspect=PERFECTIVE class=OCCURRENCE>
hit
</EVENT>
<LINK eventID=1 relatedToTime=1 relType=BEFORE/>
with a writ for loss and damages,
<EVENT eid=2 tense=NONE aspect=PERFECTIVE class=OCCURRENCE>
filed
</EVENT>
<LINK eventID=2 relatedToEvent=1 relType=BEFORE/>
in Hong Kong by the families of seven passengers
<EVENT eid=3 tense=NONE aspect=PERFECTIVE class=OCCURRENCE>
killed
</EVENT>
<LINK eventID=3 relatedToEvent=2 relType=BEFORE/>
<LINK eventID=3 signalID=1 relatedToEvent=4 relType=IS_INCLUDED/>
<SIGNAL sid=1>
in
</SIGNAL>
an air
<EVENT eid=4 class=OCCURRENCE>
crash
</EVENT>.

   All 75 people
<STATE stid=1>
on board
</STATE>
<LINK stateID=1 relatedToEvent=5 relType=INCLUDES/>
the Aeroflot Airbus
<EVENT eid=5 tense=PAST aspect=PERFECTIVE>
died
</EVENT>
<LINK eventID=5 signalID=2 relatedToEvent=6 relType=IAFTER/>
<SIGNAL sid=2>
when
</SIGNAL>
it
<EVENT eid=6 tense=PAST aspect=PERFECTIVE>
ploughed
</EVENT>
<LINK eventID=6 relatedToTime=2 relType=IS_INCLUDED/>
<LINK eventID=6 relatedToEvent=4 relType=ID/>
into a Siberian mountain
<SIGNAL sid=3>
in
</SIGNAL>
<TIMEX tid=2 type=DATE calDate=041994>
March 1994
</TIMEX>.

...

<DOA tid=1>
03-27-96
</DOA>

In addition to the abstraction that LINK provides, there are several other additions exhibited in the annotation presented above.

2.3 Further Additions to STAG 1: Modifications to Existing Tags and Attributes

<EVENT>

The tense attribute adds the value:

'NONE'
for untensed verb forms, such as participles, etc.

<DOA>

DOA previously had no attributes; it now adds the attribute

tid ::= <integer>
which allows the DOA to serve as a temporal anchor for the entire article

<LINK>

The relType attribute adds the values:

'IAFTER'
immediately after
'IBEFORE'
immediately before
'ID'
identity (of events)

2.3 Further Additions to STAG 2: New Tags

<STATE>

attributes ::= stid
stid ::= <integer>

The TimeML working group found that, in addition to annotating events, it is also useful to annotate a select subset of states. We have decided to recognize for markup only those states which are identifiably changed over the course of the document being marked up. For example, in the present document, in the expression the Aeroflot Airbus the relationship indicating that the Airbus is run and operated by Aeroflot is not a State in the desired sense. Rather, because it is persistent throughout the event line of the document, we factor it out and it is not marked up. On the other hand, properties that are known to change during the events represented/reported in an article will be marked as States, as illustrated below:

All 75 people
<STATE stid=1>
on board
</STATE>
<LINK stateID=1 relatedToEvent=5 relType=INCLUDES/>
the Aeroflot Airbus
<EVENT eid=5 tense=PAST aspect=PERFECTIVE>
died
</EVENT>
<LINK eventID=5 signalID=2 relatedToEvent=6 relType=IAFTER/>
<SIGNAL sid=2>

2.4 A BNF for TimeML Phase 1

Putting together all the modifications and additions discussed above gives us the following BNF for a first pass at TimeML.

<EVENT>

attributes ::= eid class [argEvent] [tense] [aspect]
//* N.B. argEvent is dependent on class='REPORTING'
eid ::= <integer>
*class ::= 'OCCURRENCE' | 'PERCEPTION' | 'REPORTING' | 'ASPECTUAL'
argEvent ::= <integer>
tense ::= 'PAST' | 'PRESENT' | 'FUTURE' | 'NONE'
aspect ::= 'PROGRESSIVE' | 'PERFECTIVE'

<TIMEX>

attributes ::= tid type calDate
//* calDate is limited to [[DD]MM]YYYY | ('SPR'|'SUM'|'AUT'|'WIN')YYYY
//* A standard SGML or XML DTD cannot represent this, but an XML schema can
tid ::= <integer>
*type ::= 'DATE' | 'TIME' | 'COMPLEX'
calDate ::= PCDATA

<STATE>

attributes ::= stid
stid ::= <integer>

<SIGNAL>

attributes ::= sid
sid ::= <integer>

<DOA>

'DOA' stands for 'Date of article'.

attributes ::= tid
tid ::= <integer>

<LINK>

attributes ::= (eventID | timeID | stateID) [signalID] (relatedToEvent | relatedToTime) relType
eventID ::= <integer>
timeID ::= <integer>
stateID ::= <integer>
signalID ::= <integer>
relatedToEvent ::= <integer>
relatedToTime ::= <integer>
*relType ::= 'BEFORE' | 'AFTER' | 'INCLUDES' | 'IS_INCLUDED' | 'SIMULTANEOUS' | 'IAFTER' | 'IBEFORE' | 'ID'

3.0 TimeML Phase 2

Discussion of the initial version of TimeML, both by the TimeML working group and also in plenary sessions, revealed the need for further changes and additions.

3.1 Eliminating <DOA>

During presentations of the TimeML BNF of Section 2, representatives of TIDES pointed out that the text annotated with the DOA (Date of Article) tag was simply a TIMEX and so should be annotated as such. This led to a decision to eliminate the DOA tag and use TIMEX instead. In order to preserve the distinguished status of this temporal expression as the primary temporal anchor of the article, it was also decided to wrap the TIMEX representing the date of the article with a special tag. Since NewsML provides a number of timestamps, we decided to borrow from its repository and use DocCreationTime (DCT) for this purpose.

There was discussion of using the Prism standard directly, namely prism:publicationTime.

The following is excerpted from the Prism guidelines:

There are several times that mark the major milestones in the life of a news resource: The time the story is published, the time it may be released (if not immediately), the time it is received by a customer, and the time that the story expires (if any). Dates and times should be represented using the W3C-defined profile of ISO 8601 [W3C-NOTE-datetime].
Table 4: Elements for Time and Date Information

Element                 Role

prism:creationTime      Date and time the identified resource was first created.
prism:expirationTime    Date and time when the right to publish material expires.
prism:modificationTime  Date and time the resource was last modified.
prism:publicationTime   Date and time when the resource is released to the public.
prism:releaseTime       Earliest date and time when the resource may be distributed.
prism:receptionTime     Date and time when the resource was received on current system. 

Regarding the previous tag DOA and the new tag DocCreationTime (DCT), compare the annotations below.

Phase 1:

<DOA tid=1>
03-27-96
</DOA>

Phase 2:

<DocCreationTime>
<TIMEX tid=1 type=DATE calDate=03271996>
03-27-96
</TIMEX>
</DocCreationTime>

3.2 <TIMEX3>

Since the exact nature of the tag TimeML uses to annotate temporal expressions is different in detail both from the TIMEX tag in STAG and the TIMEX2 tag in TIDES, we have decided to rename it TIMEX3. While it is true that XML's namespace facility would permit a tag with the same name, but different usages, to exist in different namespaces (e.g. stag:timex, tides:timex, and timeML:timex), we feel that it is less confusing, as well as indicative of its origins, to name the TimeML variant TIMEX3. Moreover, the calDate attribute of STAG's TIMEX was specified as having the value

[[DD]MM]YYYY | ('SPR'|'SUM'|'AUT'|'WIN')YYYY

while we wish to follow TIDES in using a modified version of the ISO 8601 time standard. We therefore replace the calDate attribute with value in our specification of TIMEX3, along with other changes we describe in this section.

3.3 Additional Tags and Attributes 1: DURATION

It was decided to add DURATION as a separate tag, distinct from other TIMEX3 expressions.

3.4 Additional Tags and Attributes 2: Temporal Functions

One of the more complex constructions discussed by the TimeML working group involved temporal expressions where the calendar date is not referred to directly, but via an expression that acts as a temporal function over a TIMEX3 expression. Examples include:

  1. last week
  2. last Thursday
  3. the week before last
  4. next week

Temporal expressions like the above might be annotated with interpetations such as the following. (We have used pre-theoretical, but fairly transparent, names to represent the hypothetical functions. DCT is the date of the article, presupposed for concreteness.)

last week = (predecessor (week DCT))
That is, we start with a temporal anchor, in this case, the DCT, coerce it to a week, than find the week preceding it.
last Thursday = (thursday (predecessor (week DCT))
Similar to the preceding expression, except that we pick out the day named 'thursday' in the predecessor week.
the week before last = (predecessor (predecessor (week DCT)))
Also similar to the first expression, except that we go back two weeks.
next week = (successor (week DCT))
The dual of the first expression: we start with the same coercion, but go forward instead of back.

Such representations present a problem for annotation because the needed functions, which would be best expressed as XML tags, can't appear as the values of attributes in another XML tag. As always, the solution is to use a tag's ID as the value of an attribute, in place of the tag itself. Given this strategy, TimeML can define whatever functions are necessary, and pass in the id of a function (tag) whenever we want to use it as the value of an attribute. If we further allow the arguments of functions to be function IDs, we can compose functions by using their IDs as pointers (much as in the box and pointer notation for LISP). To show how this would work, we present the following sample annotations. For concreteness, we assume each appears in a document which has the following DCT:

<DocCreationTime>
<TIMEX tid=1 type=DATE value=1996-03-27>
03-27-96
</TIMEX>
</DocCreationTime>

This tag's ID provides a temporal anchor in all the examples.

(1) last week

<SIGNAL sid=1>
last
</SIGNAL>
<TIMEX3 tid=2 type=DATE valueFromFunction=tf2>
week
</TIMEX3>

<coerceTo tfid=tf1 argumentID=t1 scale=WEEK/>

<predecessor tfid=tf2 signalID=1 argumentID=tf1 value=1/>

(2) last Thursday

<SIGNAL sid=1>
last
</SIGNAL>
<TIMEX3 tid=2 type=DATE valueFromFunction=tf3>
Thursday
</TIMEX3>

<coerceTo tfid=tf1 argumentID=t1 scale=WEEK/>

<predecessor tfid=tf2 signalID=1 argumentID=tf1 value=1/>

<getNamedElementOf tfid=tf3 argumentID=tf2 value=THURSDAY/>

N.B. 'last Thursday' and 'Thursday of last week' should get the same interpretation.

(3) the week before last

<TIMEX3 tid=2 type=DATE valueFromFunction=tf2>
the week
</TIMEX3>
<SIGNAL sid=1>
before last
</SIGNAL>

<coerceTo tfid=tf1 argumentID=t1 scale=WEEK/>

<predecessor tfid=tf2 signalID=1 argumentID=tf1 value=2/>

(4) next week

<SIGNAL sid=1>
next
</SIGNAL>
<TIMEX3 tid=2 type=DATE valueFromFunction=tf2>
week
</TIMEX3>

<coerceTo tfid=tf1 argumentID=t1 scale=WEEK/>

<successor tfid=tf2 signalID=1 argumentID=tf1 value=1/>

Even more complex functional expressions can be annotated using this technique:

John taught for the 6 months ending March 31, 2001.


John
<EVENT eid=1 tense=PAST aspect=PERFECTIVE class=OCCURRENCE>
taught
</EVENT>
<LINK eventID=1 signalID=1 relatedToTime=2 relType=COEXTENSIVE/>
<SIGNAL sid=1>
for
</SIGNAL>
<TIMEX3 tid=2 durationID=1 type=INTERVAL valueFromFunction=tf1>
the
<DURATION did=1 value=6 scale=MONTH>
6 months
</DURATION>
</TIMEX3>
<SIGNAL sid=2>
ending
</SIGNAL>
<TIMEX3 tid=1 value=2001-03-31>
March 31, 2001
</TIMEX3>
<endPoint tfid=tf1 signalID=2 durationID=1/>
<LINK temporalFunctionID=tf1 relatedToTime=1 relType=IDENTITY/>

That is, we extract the end point of the DURATION via the endPoint temporal function (tag) and equate it with the final date TIMEX3 via a LINK. This, of course, means extending LINKs to relating temporal functions, as well as assorted time expressions.

Given the ability of LINKs to now relate temporal functions in terms of LINK's full spectrum of relations, TimeML should have, in principle, all the necessary expressive power for representing arbitrarily complex temporal expressions with arbitrarily complex relations.

Some notes on these sample representations:

  1. We have used pre-theoretical, relatively obvious names for the functions in the examples above. We intend to use functions (tags) that, as closely as possible, follow the names of the temporal functions in the ontology we adopt. By doing this, our ontology would provide a denotational semantics for our XML-based temporal function markup.
  2. Similarly, we have used relatively arbitrary and whitebread names for the attributes of the sample temporal functions (i.e. 'argumentID' and 'value'). We would want to give the attributes (i.e. arguments or parameters) of these function entities more meaningful names in the final version of this specification.
  3. Since it would be unwieldy for our markup to recursively wrap SUCCESSOR (or PREDECESSOR) around itself, we have given SUCCESSOR (and PREDECESSOR) a second integer argument, which essentially encodes the number of calls to the function; i.e.
    (SUCCESSOR foo 1) = (primSuccessor foo)
    
    (SUCCESSOR foo 2) = (primSuccessor (primSuccessor foo))
    
    ...
    
    (SUCCESSOR foo n) = (primSuccessor_1 ... (primSuccessor_n foo) ...)
    

    This is much like the INCF (increment-by) and DECF (decrement-by) functions in Common LISP, except that INCF and DECF allow their second argument to not appear, in which case it defaults to 1. For our purposes, we want the 'count' argument to always be present.

3.5 Aspectual Verbs

Relations expressed through predicates and nominal expressions are typically anchored as deictic events. The aspect expressed on the verb is a means of looking inside the event to focus on a segment or particular part of an event. For example,

  1. a. John built the house.
    b. John has built the house.
    c. John is building the house.
    d. John had built the house.

In languages such as English and French, there is an additional grammatical device of aspectual predication, which focuses on four facets of the event history:

  1. a. Initiation: begin, start
    b. Termination: stop, end
    c. Completion: finish, complete
    d. Continuation: continue, keep

Here, a member of a closed class of predicates is able to select a verbal or nominal complement as an argument and mark that event with the function (designation) associated with one of the facets above.

For TimeML, we will designate the class of aspectual predicates as events of class ASPECTUAL. This class will have an additional attribute which we will call PHASE. This attribute will take one of the four facets listed above. Finally, we will add the attribute ARGEVENT to ASPECTUAL events as well.

<EVENT>

attributes ::= eid class [argEvent] [tense] [aspect] [phase]
//* N.B. argEvent is dependent on class='REPORTING','ASPECTUAL', or 'PERCEPTION'.
eid ::= <integer>
*class ::= 'OCCURRENCE' | 'PERCEPTION' | 'REPORTING' | 'ASPECTUAL'
argEvent ::= <integer>
tense ::= 'PAST' | 'PRESENT' | 'FUTURE' | 'NONE'
aspect ::= 'PROGRESSIVE' | 'PERFECTIVE'
phase ::= 'INITIATION' | 'COMPLETION' | 'TERMINATION' | 'CONTINUATION'

To illustrate this mark up, consider a couple of example sentences.

  1. The boat began to sink quickly.
  2. The search party stopped looking for the survivors.

These two sentences are represented below in the markup defined above.

The boat
<EVENT eid=1 tense=PAST aspect=PERFECTIVE phase=INITIATION
argEvent=2 >
began
</EVENT>
<EVENT eid=2 tense=null aspect= null>
sink
</EVENT>.

The search party
<EVENT eid=1 tense=PAST aspect=PERFECTIVE phase=TERMINATION argEvent=2>
stopped
</EVENT>
<EVENT eid=2 tense=null aspect= PROGRESSIVE>
looking
</EVENT>.
for the survivors.

3.6 Confidence Levels

In various discussions of the full TERQAS groups, the utility of being able to mark confidence values for various aspects of the annotation was pointed out. In general, it would be useful to allow confidence values to be assigned to any tag, and, in fact, to any attribute of any tag.

A convenient way to do this would be to create a confidence tag, which would consume no input, and which would have the following attributes:

<CONFIDENCE>

attributes ::= tagType tagID [attributeName] confidenceValue

where

tagType
would range over the names of all the tags of TimeML
tagID
would range over the set of actual tag IDs within the current document
attributeName
would range over the names of all the attributes of all the tags of TimeML
confidenceValue
would range over the rationals (i.e. would have a floating point value) between 0 and 1

So, for example, given this annotation:

The TWA flight
<EVENT eid=1 class=OCCURRENCE tense=PAST aspect=NONE>
crashlanded
</EVENT>
<LINK eventID=1 signalID=1 relatedToTime=1 relType=BEFORE durationID=1/>
on Easter Island
<DURATION did=1 value=2w>
two weeks
</DURATION>
<SIGNAL sid=1>
ago
</SIGNAL>.

...

<DocCreationTime>
<TIMEX tid=1 type=DATE calDate=12201999>
12-20-1999
</TIMEX>
</DocCreationTime>

if we wanted to indicate that we were unsure that we had not annotated DURATION correctly, we could add this annotation:

<CONFIDENCE tagType=DURATION tagID=1 confidenceValue=0.50/>

where the lack of the optional attribute, attributeName, indicates that the confidence applies to the whole tag.

On the other hand, if we wanted to indicate that we weren't sure if the tense of 'crashlanded' was really PAST, we could add this annotation:

<CONFIDENCE tagType=EVENT tagID=1 attributeName=tense confidenceValue=0.75/>

Abstracting confidence measures as a separate tag frees the annotation from having to include a confidence value attribute in every tag and eliminates the problem of uncertainty over the exact attribute of a tag the confidence value applies to.

Note: currently LINKs do not have IDs. If we want to apply confidence measures to LINKs and/or their attributes, we will need to give each LINK a unique ID under this proposal.

As for how confidence values should be assigned in manual annotation, we feel that, in a large-scale annotation effort such as TIMEBANK, two conditions should be satisfied:

  1. Fairly high inter-annotator agreement on the tag assignment in the text.
  2. Ease of use and habitability of the tool from the annotator's perspective.

Therefore, the annotation of a scalar value such as confidence should have at least two features:

The constraint on human annotators to a subset of the possible values should be documented in the annotation guidelines and implemented in the annotation tool. And it would probably be best if the annotation tool did not present numbers but rather natural language descriptions such as those suggested above, which would be represented in the underlying annotation numerically. For example, the annotator might pick "moderately certain", which would enter the annotation as .5. Moreover, for manual annotation, it does not seem that the 0 and 1 values will be used/useful. Presumably if the annotator doesn't trust an annotation at all s/he won't add it. And, as was suggested above, 1, at least for manual annotation, should be the default or unmarked value, and so need not be noted, since it would bulk up the files considerably, even if it were used only on entire tags.

3.7 <SCALE>

The notion of scale was introduced, but we will defer discussion until the next version of this document.

4.0 Further Modifications and Additions

4.1 EVENT and ArgEvent

As useful and intuitive as the phase attribute appears to be for events, we will consider an alternative markup approach for the current specification, motivated by concerns raised in an annotation working group meeting. Namely, we will remove the ArgEvent attribute from specific event types that have arguments, and introduce this dependency through the LINK relation. This has the advantage of allowing for multiple arguments to predicates (such as ``The ship began fall apart and sink'') without creating ad hoc composite events.

The event types affected by this change are listed below:

  1. Aspectual verbs: no args, rather introduces LINK, relationtype is taken from the PHASE values.
  2. Reporting verbs: no args, rather use LINK, where relationtype is something taken from reporting value.
  3. Belief verbs: no args, use LINKS, where relationtype is from belief relations.
  4. Perception verbs: no args, use LINK when necessary, most probably with relationtype INCLUDES.
The boat
<EVENT eid=1 tense=PAST aspect=PERFECTIVE >
began
</EVENT> 
to
<EVENT eid=2 tense=null aspect= null>
sink
</EVENT>. 
<link eventid=1 relatedtoEvent=2 relType=INITIATION>

Note also that the domain of values for relType must either be expanded to include the values for the attribute PHASE, or we will have to introduce PHASE as a value for LINK. We have chosen the former for now.

4.2 STATE

From annotator input, we have decided to demote the status of STATE from an individual tag to a specific value for the CLASS attribute of the EVENT tag. The other attributes associated with events will be potentially available to this class of predicates.

4.3. DURATION

After significant annotator input, the decision was made to add DURATION as an additional type of TIMEX3. The previous example making use of DURATION as a separate tag would now be annotated as follows:

John taught for the 6 months ending March 31, 2001.


John
<EVENT eid=1 tense=PAST aspect=PERFECTIVE class=OCCURRENCE>
taught
</EVENT>
<LINK eventID=1 signalID=1 relatedToTime=2 relType=COEXTENSIVE/>
<SIGNAL sid=1>
for
</SIGNAL>
<TIMEX3 tid=2 durationID=1 type=INTERVAL valueFromFunction=tf1>
the
<DURATION did=1 value=6 scale=MONTH>
6 months
</DURATION>
</TIMEX3>
<SIGNAL sid=2>
ending
</SIGNAL>
<TIMEX3 tid=1 value=2001-03-31>
March 31, 2001
</TIMEX3>
<endPoint tfid=tf1 signalID=2 durationID=1/>
<LINK temporalFunctionID=tf1 relatedToTime=1 relType=IDENTITY/>

4.4 Aspectual Verbs Revisited

<EVENT>

attributes ::= eid class [tense] [aspect]
eid ::= >integer<
*class ::= 'OCCURRENCE' | 'PERCEPTION' | 'REPORTING' | 'ASPECTUAL'
argEvent ::= <integer>
tense ::= 'PAST' | 'PRESENT' | 'FUTURE' | 'NONE'
aspect ::= 'PROGRESSIVE' | 'PERFECTIVE'

Add to relation types:

relType ::= 'INITIATION' | 'COMPLETION' | 'TERMINATION' |'CONTINUATION'

Use LINK tag to establish relation. Here is an example with the new definition of EVENT for aspectual verbs.

  1. The child began to choke and shake uncontrollably.
The child
<EVENT eid=1 tense=PAST aspect=PERFECTIVE class=ASPECTUAL>
began
</EVENT>
to 
<EVENT eid=2 tense=null aspect=null class=OCCURRENCE>
choke
</EVENT>
and
<EVENT eid=3 tense=null aspect=null class=OCCURRENCE>
shake
</EVENT>
Uncontrollably.
<LINK eventid=1 relatedtoEvent=2 relType=INITIATES />

<LINK eventid=1 relatedtoEvent=3 relType=INITIATES />

4.5 <LINK>

attributes ::= (eventID | timeID) [signalID] (relatedToEvent
               | relatedToTime) relType magnitude

eventID ::= <integer>
timeID ::= <integer>
signalID ::= <integer>
relatedToEvent ::= <integer>
relatedToTime ::= <integer>
*relType ::= 'BEFORE' | 'AFTER' | 'INCLUDES' | 'IS_INCLUDED' |
             'SIMULTANEOUS' | 'IAFTER' | 'IBEFORE' | 'ID' |
	     'INITIATES' | 'CULMINATES' | 'TERMINATES' | 'CONTINUES'
magnitude ::= <integer>

4.6 <EVENT> with POLARITY

attributes ::= eid class  [tense] [aspect] polarity
eid ::= <integer>
*class ::= 'OCCURRENCE' | 'PERCEPTION' | 'REPORTING' | 'ASPECTUAL' |
           'INTENDING' | 'STATE'
tense ::= 'PAST' | 'PRESENT' | 'FUTURE' | 'IRREALIS' | 'NONE'
polarity ::= 'POSITIVE' | 'NEGATIVE'
aspect ::= 'PROGRESSIVE' | 'PERFECTIVE'


Bibliography

Ferro, Lisa, Sundheim, Beth, and Gerber, Laurie (2002) Instruction Manual for the Annotation of Temporal Expressions, MITRE Washington C3 Center, McLean, Virginia.

Fowler, Martin (1999) Refactoring: Improving the Design of Existing Code. Addison-Wesley, Reading, Massachusetts.

Setzer, Andrea (2001) Temporal Information in Newswire Articles: An Annotation Scheme and Corpus Study, Doctoral Dissertation, University of Sheffield, Sheffield, UK.