GOALS AND OBJECTIVES

TERQAS — Kick-Off Meeting

January 30-31, 2002

Attendees: James Pustejovsky, José Castaño, Roser Saurí, Robert Ingria, Marc Verhagen, Drago Radev, Antonio Sanfilippo, John Frank, Beth Sundheim, Bran Boguraev, Lisa Ferro, George Wilson, Inderjeet Mani, Jean Michel Pomareda, Penny Lehtola, Mark Maybury, David Day, and Bev Nunan

Table of Contents

January 30, 2002

GOALS AND OBJECTIVES (James Pustejovsky)

Corpus for Analysis: Evaluation and Selection (Beth Sundheim & Lisa Ferro)

TEMPEX TUTORIAL (George Wilson)

GOLD STANDARD CREATION (Drago Radev)

WORKSHOP DELIVERABLES: TenseML AND TIMEBANK (James Pustejovsky)

January 31, 2002

INFERRING TEMPORAL ORDERING IN NEWS (Inderjeet Mani)

SUBGOALS towards Event Timeline Construction (James Pustejovsky)

WORKING GROUPS

1. Ontology

2. Corpus Collection/Definition

3. Specification and Definition of TenseML

4. Query Corpus Construction

5. Algorithm Review and Development

6. Evaluation

January 30, 2002

GOALS AND OBJECTIVES

James Pustejovsky

Goals:

Design of TenseML, a metadata standard to markup events, their temporal anchoring and how they relate. From here, creation of a Gold Standard.

Design of the algorithms for:

extracting events
QA

Relevant issues at discussion here:

What TenseML should involve:

Hierarchy for events (pretty much in the lines of already existing standard proposals)
Temporal expressions: typification (points, intervals, etc.), how do they relate to events.
Types of relations among events (precedence, overlapping, partial overlapping, etc.)
Event persistence and its ramifications
Basic semantic stuff: aspectuality, modality, etc.
The role of the Gold Standard: how "gold" can/should it be?

Evaluation methods

Possible data sets available:

TIDES

TDT, TDT2

TREC

ACE (Automatic Context Extraction)

There was a long discussion about how to evaluate. It was stated that developing the gold standard will be a cyclic process. Need to agree on a subset of data sets. Need to evaluate and develop on how it relates to QA. By March or April will need specifics on how to evaluate. The government urged the group to have a long range vision. To think future!

One of the data sets that will be used in the workshop is PropBank and that should be available by mid-March.

Corpus for Analysis: Evaluation and Selection

Beth Sundheim & Lisa Ferro
Presentation:

TIDES as a text set for Time Expressions.

The project has focussed on time adverbials in isolation (e.g., prepositions are not taken into account, even if they are part of the Temporal Expression). Temporal relations aren’t marked here, either.

The current task isn’t the categorization of time adverbials but assigning values to expressions. The manual annotation is thought to guide the training of the future computer annotation task.

The representation is an extension of ISSO standard on temporal markup. The basic notation is in terms of points. Among other issues, TIDES covers: duration, week values, imprecise temporal expressions (summer, etc.), truncation and other forms of imprecision.

Issues arisen from this discussion (and from the evaluation of the tagset):

Concerning what info do we want to annotate:
Further topics that can be explored, not handled in TIDES:
- Embedded or complex expressions (e.g., from Monday to Wednesday)
- Sets of durations (last three months)
- Holidays
- Time zones; culture-related Time Expressions
- Temporal displacements (the President met his wife …)
- Ambiguity (there is no way to encode it now); how to represent ambiguous time stamps.
- Anaphora related topics
- Co-reference
- Relations among events
- …
Essentially, there are 2 directions to take given TIDES as starting point:
1. Binding, related to anaphora,
2. Recognizing events, which requires having an ontology, the use of morphology, the knowledge already available about regular polysemy, clustering of events, modality, scalar increasing/decreasing, etc.
It has to be agreement on:
- the level of granularity of the annotation.
- the complexity of the resolution of anaphora.
- what kind of events do we want to identify; patterns of events
- how much put in the tags
- …
Certain expressions should be characterized by presenting scaled (or fuzzy) borders (e.g., century meaning approx. 100 years).

Polysemy of temporal expressions should be coded in the marking up. Prior to this, it has to be an identification of the different types of ambiguity; e.g., next Monday (either one or the other interpretation), last year (vague), fifth anniversary (event — temporal expression).

It could be interesting to give a measure of the confidence of the markup as part of the notation (currently there is only a "comments" field). This would allow knowing the rate of possible error accumulation.

On the use of context in the annotation process:

It’s still not clear to what level and for what particular purposes to use the context. It may be interesting to contemplate different levels of information according to their origin: just from the temporal expression, from the phrase in which it appears (e.g., including info from prepositions), from the aspectuality of the event, from an upper-sentence level, from the temporal info of the document (e.g., issuing date). In addition, the annotation should keep track of the level from which the info is obtained.

Similarly, it may be interesting to define different levels of "windows" for anaphora resolution and, as above, to keep track of it.

It would be useful to have a diary collecting problems arisen from the marking up (Beth. Lisa). Beth said she has a laundry list of items that should be incorporated into the standards. For example: "he last three summers." It's a set type annotation but it is also a duration annotation. Everyone said they would like to see Beth's laundry list.

It was asked how long does it take to train an annotator. How fast does it take to annotate an article. The workshop would like to bring in someone to do the annotating. George Wilson mentioned that Inderjeet had six grad students from Georgetown annotating documents. They had two 3-hour training sessions. The result was an F score of about .8. The articles were about 1000 words and it took about 3/4 hour to tag.
Single events and multiple events: there is a corpus from NIST. Could be useful.

Remarks and comments:

Suggestion: Pick some functions and talk about time as floating point number. For TenseML: get the most flexible tense attribute.

Human and computer are interchangeable. Question: need to determine: Whether it's a full grain or background.

We need to tag speech events. Types of events: Persistent chain of events and isolated or ambiguous events.

Descriptor attributes come in handy. What's the database key? Persistent entity.
Concerning the orientation TERQAS work should take:

Future work should count on a set of the queries of interest of TERQAS consumers.

In addition, it’s important to do some corpus analysis in order to know the relevance of what we want to handle. So, future work should be both question- and corpus-driven.

It could be useful the use of an initial filter in order to catch as many expressions as possible (max. recall) and then apply a post-refinement process.

The markup should be monotonic.

NIST Corpus: Dragomir (???)

General comments on TenseML design:

The annotation has to be independent from the interpreter; i.e., each client should be able to extract whatever it wants and interpret data in the way it wants. This is related to:

the level of granularity of the annotation
the markup of "fuzzy borders" of certain temporal expressions
etc.

TenseML has to be flexible and extendible for future improvements. Make a DTD more than a tagset.

The general architecture of the system has to allow both human and machine work.

TEMPEX TUTORIAL

George Wilson

Tempex has levels of heuristics. More or less guessing.

It doesn’t use typing; everything is a date.

TEMPEX doesn't cover:

the last Monday of January

the end of 1999

Late yesterday morning

Question arisen here:

How far can we get with just FSA based on POS and heuristics like TEMPEX?

GOLD STANDARD CREATION

Drago Radev
Relevant points from the presentation:

For the creation of a Gold Standard:

Select corpus and annotation standards
Start annotation right away.
Deal with possible low interjudge agreement.
Use XML
Hold out data for evaluation.
Evaluatoin metrics have to be made early
Freeze corpus early on
Have a Question set early on.

Issues arisen from the discussion:

How much do we need to invest for the creation of a Gold Standard:

TERQAS Gold Standard has to be more ‘standard’ than ‘gold’.

The Gold Standard can be useful for discovering further data (thus, data-driven approach).

Gold Standard in order to get concrete results.

Concerning the data-driven approach given a Gold Standard:

The strategy should be a mixed approach:

Top-Down, form particular text and query sets; even domain oriented.
Bottom up, from the theory and current research in semantics.

On the adequate corpus:

Choose corpora with different domains and temporally rich.

ACE corpus, very temporally rich.

The government suggested that for developing a gold standard for the time standard maybe we should let TIDES develop it and leverage off it. If there are other programs that are developing standards, then let’s not use our resources. They would like to see more work on TenseML. We could use the TIDES or ACE annotation as a baseline but need to go further.

WORKSHOP DELIVERABLES: TenseML and TIMEBANK
James Pustejovsky

Would like to see something produced that others could build from.

There is a data driven part and question driven part of the workshop. We probably need to work on both.

There is implication that the Q&A is answering by scanning the entire corpus. You would have to read every document to get the time markup.

The idea of looking at timelines because you need to look at entire corpus. The analyst is trying to come up with a story. A possibility is a gold standard of what transpired is your time line and then you could have a complete corpus.

TenseML: should be able to represent the timeline

Attributes that could be used to categorize questions:

1) Direct Question. Tensed predicate. Did John killed Jones?

2) Entailed Question Tensed predicate Is John dead?

3) Presupposed Questions

II.

Event, Interval, …

III.

1st Order Event A killed B

2nd. Order Event. A’s killing B

IV.

Classic event hierarchy:

eventuality

state transition process

punct. achiev. accompl.

Other relevant issues:

Interest in event persistence
Ambiguity in Temporal Expressions that can be solved by the meaning of the predicate: start at 9 pm vs arrive at 9 pm

The question here:

What direction are we taking: from questions to semantic parsing or the other way round?

We can follow both strategies in parallel.

January 31, 2002

INFERRING TEMPORAL ORDERING IN NEWS

Inderjeet Mani
Some insights:

It’s better to focus on clause ordering than sequence ordering. Events in news follow more a news value than a chronology order.

Quotations are difficult to deal with.

To create a cascade of finite automata to improve TEMPEX is hard.

To create a graph for temporal ordering is difficult. (Ambiguity).

www.cs.columbia.edu/~bschiff/time

SUBGOALS TOWARDS EVENT TIMELINE CONSTRUCTION

James Pustejovsky

Necessary tasks:

Temporal Expression Recognition
Identification of events
Relations between events and Temporal Expressions

1. Temporal Expression Recognition (levels of analysis)

Increase the power of TEMPEX:

[Quantificational force over any of the properties below]

Existence, universal, and hedges
Basic Temporal Expression identification
Increased coverage of lexical triggers
Enhanced patterns associated with triggers
Premodifiers (late last night, a couple of months)
Composition with prepositional phrases
Definite Temporal Expressions
Temporal paths (from noon till midnight)
Holidays (point/period interpretations)
Recursive Temporal Expressions

1960s, 1970s, the 90s,
Sequences

first meeting

Schedules
Time of (event)
Temporal Duration event nominals (lunch, tea, coffee, dinner, breakfast)
Conventional Interval Designations (Prime Time)
Named Intervals (snow day, pay day, Labor Day)
Trends, Generation X

Iterated Periods (last three summers)

when-clauses
interval conjunctions (after 5 but before 7, more than 2 weeks but less than a month)

Particular culture- and organization-related Time Expressions

Syntactic and morphological triggers (era, pre-, post-, …)

Compositional issues:

coercion (e.g., on proper names: Vietnam)

uncertainty of how to interpret denotation of a Temporal Expression

…

2. Anchoring the Temporal Expression to an event

a. Tensed Predicates.

b. Nominal Expressions (last week bombing)

c. Event Composition. (last week bombing was planned 3 months ago)

Need to review existing tools that perform anchoring of TE to events (Inderjeet, Kamp)

3. Temporal Ordering.

Classification of events that involve change of states or are relevant in terms of aspectuality: prevent, precede, start, cause, …

4. Construction of the Query Sets.

With the participation of the costumer.

5. Building corpora

Some consultant? e.g., Patrick Hanks

Corpora:

100 Articles from Reuters: Single language, no reported speech.

100 Articles from ACE (Newswire, Broadcast, newspaper)

100 Articles from DUC (NIST) (Newswire)

100 Articles from PropBank (newspapers).

100 Articles TIDES?

WORKING GROUPS
1. Ontology (Antonio Sanfilippo, Chair)

Task Definition:

Examine resources wrt encoding of temporal, event class, and aspectual information. Comparative evaluation of ontologies for feature encoding.

Members:

Rob Gaizauskas, Graham Katz, Cleo C, James Pustejovsky, Inderjeet Mani, Roser Saurí

Microkosmos

Cyc

IEEE Upper Ontology

DAML-OIL

ISI Upper Model Netchez Nigel

EAGLES Review

WordNet and EuroWordNet

CMU?

LingoMotors Type System

SIMPLE Ontology

KIF Stanford Ontology

MetaCarta Resources

Lotus Discover Server

Lockheed Martin

PROPBANK

FrameNet

Delis

MindNet

Contacts:

Obrst, Leo MITRE

Guarino, Nicola Ontology Page

Sergei Nirenburg

2. Corpus Collection/Definition (David Day, Temp Chair)

Task:

Get permission for use. Create a common representation. Convert the corpora to this common form. Create tools for retrieval and preprocessing of the corpora. Determine justification and characterization of the features of each corpus, relative to our goals.

Each corpus is approximately 100 articles for annotation purposes. Determine who will annotate the text, when initial specs are frozen. Interannotation agreement needs to be measured. Extend and modify Alembic Workbench to allow for annotation of Corpus according to appropriate tag set. This might directly involve modifying TempEx so as to allow for more robust Annotation Suggestions during corpus markup.

Members:

Drago Radev, Beth Sundheim, Marc Verhagen, Inderjeet Mani, MITRE person, Lisa Ferro.

DUC; samples from each of the four types: single event, MEST (multiple event of single type), Biography, Kitchen sink. Drago owns this corpus.
ACE: Broadcast News subset (radio and TV). Beth owns this corpus.
Reuters: Grain futures and options: Inderjeet owns this.
PropBank: WSJ news articles, James P. owns this corpus. how much of the T/A is marked up?
Examine some further corpora

Modes of use for corpora:

a. Annotation

b. Research and discovery

3. Specification and Definition of TenseML (James Pustejovsky, Chair)

Task:

Understanding the various knowledge representations that will be included in different components of the language. Follow up the question set, frame the initial set of requirements from this set. Define the tag set so that the expressions can be extensible.

Create a language not simply a markup. Explore utility of RDF for encoding relations.

Members:

John Frank, Bob Ingria, Graham Katz, Inderjeet Mani (consultant), Jean-Michel Pomareda (consultant), Beth Sundheim, Bran Boguraev (consultant), Antonio Sanfilippo (consultant), Jonathan Rees (consultant).

4. Query Corpus Construction (Drago Radev, Chair)

Task:

Examine existing taxonomies of queries and existing question sets, Encarta log files. Create a typology of questions relating to temporal queries. Look at Teknowledge set of queries (Jean-Michel). Construct at least three different sets of question classes/sets for development and testing. Some questions have answers in the corpus, but not for all queries. Defining the constraints on the query language, integrating this with the specification of TenseML feature and functionality.

Members:

Jean-Michel Pomareda, Lisa Ferro, Marc Verhagen (consultant, contacting David Elworthy), TREC people, AQUAINT executive committee suggestions.

5. Algorithm Review and Development (José Castaño, Chair)

Task:

Define and scope the range of algorithms

(a) Generate TenseML representations from text:

i. Extensions necessary to TempEx. (covering the functionality of TempEx)

ii. Specification of as many of the requirements from the Tempex-extensions as possible (relating to TenseML requirements)

iii. Explore the use of various technologies/algorithms on the features in the specification for TenseML, and find where the Wall of Analysis is. And where is non-local, and compositional analysis.

(b) Using the representation as an application (i.e., to answer a question). Create a report cookbook of how TenseML can be consumed in a specific kind of application in order to do a certain kind of inference. Look at algorithms for event inference and mock up scenarios for how TenseML can support richer temporally based questions in the context of AQUAINT. (This would be the basis for overall evaluation of the workshop)

Members:

Bran Boguraev, John Frank, José Castaño, Bob Ingria, Marc Verhagen, George Wilson (consultant), Antonio Sanfilippo (consultant),

Notes:

Communication with AQUAINT PIs, to discuss their problems, approach, issues, solutions, resources.

Preliminary reports from WGs can be sent off to AQUAINT people.

6. Evaluation

Task:

Define criteria for success. Take annotated corpus, according to the specifications of TenseML. Run queries over the corpus taking advantage of the features and relations that are in TenseML. Results will be representative of richer, temporally based queries. This will be compared with the same sort of query (or even identical ones) over tagged corpus (the same corpus) using just named entity recognotion and secondly, TempEx tagger.

Having some way of evaluating the coverage of TenseML against new corpora, even when annotated by hand.

Look at algorithms for event inference and mock up scenarios for how TenseML can support richer temporally based questions in the context of AQUAINT.

Demonstration of how richer questions can be asked of a text corpus, involving Temporal Expressions?

Members:

Donna Harman,