This document is largely a rehash of the readme.html
document in the training data distribution, but the last paragraphs of
the "Data Description" section are different and a section on how to
submit results is added to the end.
We describe the TempEval data, the way they were created, the validation and scoring scripts that are bundled with the data, and the format for submissions. This document does not replace the task description on the SemEval and TempEval websites, but complements it.
data/taskAB
contain all TLINKS
required by tasks A and B, and data/taskC
contains all
links needed for task C. However, the relType attribute of each TLINK
is set to UNKNOWN. The task is to replace the UNKNOWN values with one
of the six allowed values listed above.
It is not necessary to determine what events are on the Event Target
List (ETL). Recall that the Event Target list consists of those events
that occur 20 times or more in the corpus. A complete list of stems ordered on frequency is
included in the docs
directory (only stems occurring more
than once are added to the list).
The data directory has two sub directories, one with the data for tasks A and B, and one with data for task C. Both contain 20 documents.
It should be noted that the test set included here does not quite follow the specifications in the task definition, where we said we would annotate 20-25 documents drawn from a source like TimeBank and that these documents would contain at least 5 instances of every event on the ETL. This proved to be both hard and impossible. Instead, we opted for a more traditional approach where we split TimeBank into a training set and a test set.
http://timeml.org/site/timebank/timebank.html
). The
annotation procedure for TLINKs includes dual annotation by seven
annotators using a web-based annotation interface
(see the screen shot page for more
details). After this phase, two experienced annotators looked at all
occurrences where two annotators differed as to what relation type to
select. For task C, there was an extra annotation phase where the main
events were
selected. Annotation
guidelines for main event annotation are included in this
distribution.
relType
attribute.
To validate TempEval files using the DTD, open a terminal window (Linux/Unix/MacOSX) or a command prompt (Windows) and type the following:
This will write validation errors and warnings to the standard output. All lines with INFO-300 can be ignored, in general, they report on reference counts. On Unix/Linux systems, these lines can be filtered out by using:% perl validate.pl ../data/taskAB
% perl validate.pl ../data/taskC
% perl validate.pl ../data/taskAB | grep -v INFO-300
% perl validate.pl ../data/taskC | grep -v INFO-300
The script assumes the Perl modules XML::Checker and XML::RegExp, both available at CPAN (http://www.cpan.org).
taskAB/
taskC/
Directory taskAB
should contain all 20 documents from
data/taskAB
with UNKNOWN values of the
relType
attribute replaced with one of the six TempEval
relations. Similarly, taskC
should contain all 20
documents from data/taskC
with UNKNOWN values
replaced. Participants who chose to not participate in task C can
leave this directory empty.
Please direct questions to tempeval@timeml.org
.