Brandeis University, version 1.0, January 2008.
The AQUAINT TimeML Corpus (previously referred to as the Opinion Corpus or the AQUAINT TimeBank corpus) contains 73 news articles that have been annotated with temporal information, adding events, times and temporal links between events and times. The annotation follows the TimeML 1.2.1 specifications. This file includes a brief discussion of TimeML as well as a description of how the AQUAINT TimeML Corpus was created.
The most recent information on TimeML is always available at www.timeml.org.
TimeML aims to capture and represent temporal information. This is accomplished using four primary tag types: TIMEX3 for temporal expressions, EVENT for temporal events, SIGNAL for temporal signals, and LINK for representing relationships. For a detailed description of TimeML, see the TimeML 1.2.1 Specification and Guidelines, available at http://timeml.org/site/publications/specs.html. Here, we give a summary of the most important tags.
TIMEX3 — This tag is used to capture dates, times, durations, and sets of dates and times. All TIMEX3 tags include a type and a value along with some other possible attributes. The value is given according to the ISO 8601 standard. The TIMEX3 tag allows specification of a temporal anchor. This facilitates the use of temporal functions to calculate the value of an under specified temporal expression. For example, an article might include a document creation time such as "January 3, 2006". Later in the article, the temporal expression "today" may occur. By anchoring the TIMEX3 for "today" to the document creation time, we can determine the exact value of the TIMEX3.
EVENT — The EVENT tag is used to annotate those
elements in a text that mark the semantic events described by it. Any
event that can be temporally anchored or ordered is captured with this
tag. An EVENT includes a class attribute with values such
as
SIGNAL — The SIGNAL tag is used to annotate temporal function words such as "after", "during", and "when". These signals are then used in the representation of a temporal relationship.
The following three tags are link tags. They capture temporal, subordination, and aspectual relationships found in the text. These tags do not consume any actual text, but they do relate the three tag types above to each other.
TLINK — Temporal links are represented with a TLINK
tag. A TLINK can temporally relate two temporal expressions, two
event instances, or a temporal expression and an event instance.
Along with an identification marker for each of these two elements, a
relation type is given such as
SLINK — This tag is used to capture subordination
relationships that involve event modality, evidentiality, and
factuality. An SLINK includes an event instance ID for the
subordinating event and an event instance ID for the subordinated
event. Possible relation types for SLINK
include
ALINK — An aspectual connection between two event
instances is represented with ALINK. As with SLINK, this tag includes
two event instance IDs, one that introduces the ALINK and one that is
the event argument to that event. The introducing event has the
class
The AQUAINT TimeML corpus contains 73 articles from a variety of news reports. These particular sources were chosen because they offered text rich with temporal information both in the form of temporal expressions and events that could be anchored or ordered in time. The documents were taken from four topics from the TREC novelty track (see http://trec.nist.gov/tracks.html):
N16 Kenya Tanzania Embassy bombings N19 Elian Gonzalez Cuba N35 NATO, Poland, Czech Republic, Hungary N45 Slepian abortion murder
The corpus contains about 35,000 tokens and some 16,000 tags were added (12,000 if we adjust for the redundancy introduced by the EVENT and MAKEINSTANCE tags). Some annotation statistics are printed in the table below:
N16 N19 N35 N45 total EVENT 765 2117 490 1060 4432 MAKEINSTANCE 765 2117 490 1060 4432 TIMEX3 115 253 83 154 605 SIGNAL 33 77 62 96 268 ALINK 11 39 13 8 71 SLINK 96 203 117 259 675 TLINK 1013 2788 516 1048 5365 Total tags 2798 7594 1771 3685 15848 Total tokens 7027 16242 3631 7254 34154 Documents 23 25 10 15 73
The data in N16, N19 and N45 contain only TimeML tags, the documents in N35 also contain document-level tags like DOCNO, HEADER, TEXT, and others.
Each article was annotated by one of three experienced annotators from Brandeis University or Georgetown University. All documents were then validated against version 1.2.1 of the TimeML Document Type Definition. Validity checking against the DTD was performed using the Perl XML::Checker::Parser module, available as part of XML-Checker-0.13 from www.cpan.org, using the validate.pl script.
It should be noted that although we validated all annotations, the AQUAINT TimeML corpus is not as mature as TimeBank 1.2. More specifically, we did not go through several rounds of annotation and annotation reviews. Also, neither TimeBank 1.2 nor the AQUAINT TimeML corpus have used dual annotation.
Inter-annotator agreement scores were not created for the AQUAINT TimeML corpus. Refer to the TimeBank 1.2 documentation for the IAA scores for TimeBank 1.2.
The following people have contributed to the AQUAINT TimeML corpus:
Annotation Jenna Fernandes, Jessica Moszkowicz, Stephanie Poisson Validation Seo-Hyun Im, Emin Mimaroglu, Jessica Moszkowicz, Hongyuan Qiu, Marc Verhagen Other James Pustejovsky, Inderjeet Mani, Roser Saurí, Amber Stubbs, Marc Verhagen
The AQUAINT TimeML corpus was created as part of the TARSQI project which was funded under the ARDA/DTO AQUAINT program under grant number NBCH040027.
aquaint_timeml_1.0/data/
Contains the 73 annotated documents, grouped in four directories.
aquaint_timeml_1.0/doc/
This file plus the TimeML specifications and guidelines.
aquaint_timeml_1.0/validation/
Contains two versions of the DTD and the Perl script used for validation.
The annotations in this data collection are copyrighted by Brandeis University and are released under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License.
Note: The textual news documents annotated in this corpus have been collected from a wide range of sources and are not copyrighted by Brandeis University. The user acknowledges that the use of these news documents is restricted to research and academic purposes only.