TERQAS — Kick-Off Meeting

January 30-31, 2002

Attendees: James Pustejovsky, José Castaño, Roser Saurí, Robert Ingria, Marc Verhagen, Drago Radev, Antonio Sanfilippo, John Frank, Beth Sundheim, Bran Boguraev, Lisa Ferro, George Wilson, Inderjeet Mani, Jean Michel Pomareda, Penny Lehtola, Mark Maybury, David Day, and Bev Nunan

 

Table of Contents

January 30, 2002

  1. GOALS AND OBJECTIVES (James Pustejovsky)
  2. Corpus for Analysis: Evaluation and Selection (Beth Sundheim & Lisa Ferro)
  3. TEMPEX TUTORIAL (George Wilson)
  4. GOLD STANDARD CREATION (Drago Radev)
  5. WORKSHOP DELIVERABLES: TenseML AND TIMEBANK (James Pustejovsky)
  6. January 31, 2002

  7. INFERRING TEMPORAL ORDERING IN NEWS (Inderjeet Mani)
  8. SUBGOALS towards Event Timeline Construction (James Pustejovsky)
  9. WORKING GROUPS

1. Ontology

2. Corpus Collection/Definition

3. Specification and Definition of TenseML

4. Query Corpus Construction

5. Algorithm Review and Development

6. Evaluation


January 30, 2002

 

GOALS AND OBJECTIVES

James Pustejovsky

Goals:

  1. Design of TenseML, a metadata standard to markup events, their temporal anchoring and how they relate. From here, creation of a Gold Standard.
  2. Design of the algorithms for:

Relevant issues at discussion here:

TIDES

TDT, TDT2

TREC

ACE (Automatic Context Extraction)

There was a long discussion about how to evaluate. It was stated that developing the gold standard will be a cyclic process. Need to agree on a subset of data sets. Need to evaluate and develop on how it relates to QA. By March or April will need specifics on how to evaluate. The government urged the group to have a long range vision. To think future!

One of the data sets that will be used in the workshop is PropBank and that should be available by mid-March.

 

Corpus for Analysis: Evaluation and Selection

Beth Sundheim & Lisa Ferro

Presentation:

TIDES as a text set for Time Expressions.

The project has focussed on time adverbials in isolation (e.g., prepositions are not taken into account, even if they are part of the Temporal Expression). Temporal relations aren’t marked here, either.

The current task isn’t the categorization of time adverbials but assigning values to expressions. The manual annotation is thought to guide the training of the future computer annotation task.

The representation is an extension of ISSO standard on temporal markup. The basic notation is in terms of points. Among other issues, TIDES covers: duration, week values, imprecise temporal expressions (summer, etc.), truncation and other forms of imprecision.

Issues arisen from this discussion (and from the evaluation of the tagset):

 

TEMPEX TUTORIAL

George Wilson

Tempex has levels of heuristics. More or less guessing.

It doesn’t use typing; everything is a date.

TEMPEX doesn't cover:

Question arisen here:

How far can we get with just FSA based on POS and heuristics like TEMPEX?

 

GOLD STANDARD CREATION

Drago Radev

Relevant points from the presentation:

For the creation of a Gold Standard:

Issues arisen from the discussion:

 

WORKSHOP DELIVERABLES: TenseML and TIMEBANK

James Pustejovsky

Would like to see something produced that others could build from.

There is a data driven part and question driven part of the workshop. We probably need to work on both.

There is implication that the Q&A is answering by scanning the entire corpus. You would have to read every document to get the time markup.

The idea of looking at timelines because you need to look at entire corpus. The analyst is trying to come up with a story. A possibility is a gold standard of what transpired is your time line and then you could have a complete corpus.

TenseML: should be able to represent the timeline

Attributes that could be used to categorize questions:

I.

1) Direct Question. Tensed predicate. Did John killed Jones?

2) Entailed Question Tensed predicate Is John dead?

3) Presupposed Questions

II.

Event, Interval, …

III.

1st Order Event A killed B

2nd. Order Event. A’s killing B

IV.

Classic event hierarchy:

eventuality

state transition process

punct. achiev. accompl.

Other relevant issues:

The question here:

What direction are we taking: from questions to semantic parsing or the other way round?

We can follow both strategies in parallel.



January 31, 2002


 

INFERRING TEMPORAL ORDERING IN NEWS

Inderjeet Mani

Some insights:

www.cs.columbia.edu/~bschiff/time

 

SUBGOALS TOWARDS EVENT TIMELINE CONSTRUCTION

James Pustejovsky

Necessary tasks:

  1. Temporal Expression Recognition
  2. Identification of events
  3. Relations between events and Temporal Expressions

1. Temporal Expression Recognition (levels of analysis)

Increase the power of TEMPEX:

[Quantificational force over any of the properties below]

Particular culture- and organization-related Time Expressions

Syntactic and morphological triggers (era, pre-, post-, …)

Compositional issues:

coercion (e.g., on proper names: Vietnam)

uncertainty of how to interpret denotation of a Temporal Expression

2. Anchoring the Temporal Expression to an event

a. Tensed Predicates.

b. Nominal Expressions (last week bombing)

c. Event Composition. (last week bombing was planned 3 months ago)

Need to review existing tools that perform anchoring of TE to events (Inderjeet, Kamp)

3. Temporal Ordering.

Classification of events that involve change of states or are relevant in terms of aspectuality: prevent, precede, start, cause,

4. Construction of the Query Sets.

With the participation of the costumer.

5. Building corpora

Some consultant? e.g., Patrick Hanks

Corpora:

100 Articles from Reuters: Single language, no reported speech.

100 Articles from ACE (Newswire, Broadcast, newspaper)

100 Articles from DUC (NIST) (Newswire)

100 Articles from PropBank (newspapers).

100 Articles TIDES?

 

WORKING GROUPS

1. Ontology (Antonio Sanfilippo, Chair)

Task Definition:

Examine resources wrt encoding of temporal, event class, and aspectual information. Comparative evaluation of ontologies for feature encoding.

Members:

Rob Gaizauskas, Graham Katz, Cleo C, James Pustejovsky, Inderjeet Mani, Roser Saurí

 

Microkosmos

Cyc

IEEE Upper Ontology

DAML-OIL

ISI Upper Model Netchez Nigel

EAGLES Review

WordNet and EuroWordNet

CMU?

LingoMotors Type System

SIMPLE Ontology

KIF Stanford Ontology

MetaCarta Resources

Lotus Discover Server

Lockheed Martin

PROPBANK

FrameNet

Delis

MindNet

Contacts:

Obrst, Leo MITRE

Guarino, Nicola Ontology Page

Sergei Nirenburg

 

2. Corpus Collection/Definition (David Day, Temp Chair)

Task:

Get permission for use. Create a common representation. Convert the corpora to this common form. Create tools for retrieval and preprocessing of the corpora. Determine justification and characterization of the features of each corpus, relative to our goals.

Each corpus is approximately 100 articles for annotation purposes. Determine who will annotate the text, when initial specs are frozen. Interannotation agreement needs to be measured. Extend and modify Alembic Workbench to allow for annotation of Corpus according to appropriate tag set. This might directly involve modifying TempEx so as to allow for more robust Annotation Suggestions during corpus markup.

Members:

Drago Radev, Beth Sundheim, Marc Verhagen, Inderjeet Mani, MITRE person, Lisa Ferro.

  1. DUC; samples from each of the four types: single event, MEST (multiple event of single type), Biography, Kitchen sink. Drago owns this corpus.
  2. ACE: Broadcast News subset (radio and TV). Beth owns this corpus.
  3. Reuters: Grain futures and options: Inderjeet owns this.
  4. PropBank: WSJ news articles, James P. owns this corpus. how much of the T/A is marked up?
  5. Examine some further corpora

Modes of use for corpora:

a. Annotation

b. Research and discovery

 

3. Specification and Definition of TenseML (James Pustejovsky, Chair)

Task:

Understanding the various knowledge representations that will be included in different components of the language. Follow up the question set, frame the initial set of requirements from this set. Define the tag set so that the expressions can be extensible.

Create a language not simply a markup. Explore utility of RDF for encoding relations.

Members:

John Frank, Bob Ingria, Graham Katz, Inderjeet Mani (consultant), Jean-Michel Pomareda (consultant), Beth Sundheim, Bran Boguraev (consultant), Antonio Sanfilippo (consultant), Jonathan Rees (consultant).

 

4. Query Corpus Construction (Drago Radev, Chair)

Task:

Examine existing taxonomies of queries and existing question sets, Encarta log files. Create a typology of questions relating to temporal queries. Look at Teknowledge set of queries (Jean-Michel). Construct at least three different sets of question classes/sets for development and testing. Some questions have answers in the corpus, but not for all queries. Defining the constraints on the query language, integrating this with the specification of TenseML feature and functionality.

Members:

Jean-Michel Pomareda, Lisa Ferro, Marc Verhagen (consultant, contacting David Elworthy), TREC people, AQUAINT executive committee suggestions.

 

5. Algorithm Review and Development (José Castaño, Chair)

Task:

Define and scope the range of algorithms

(a) Generate TenseML representations from text:

i. Extensions necessary to TempEx. (covering the functionality of TempEx)

ii. Specification of as many of the requirements from the Tempex-extensions as possible (relating to TenseML requirements)

iii. Explore the use of various technologies/algorithms on the features in the specification for TenseML, and find where the Wall of Analysis is. And where is non-local, and compositional analysis.

(b) Using the representation as an application (i.e., to answer a question). Create a report cookbook of how TenseML can be consumed in a specific kind of application in order to do a certain kind of inference. Look at algorithms for event inference and mock up scenarios for how TenseML can support richer temporally based questions in the context of AQUAINT. (This would be the basis for overall evaluation of the workshop)

Members:

Bran Boguraev, John Frank, José Castaño, Bob Ingria, Marc Verhagen, George Wilson (consultant), Antonio Sanfilippo (consultant),

Notes:

Communication with AQUAINT PIs, to discuss their problems, approach, issues, solutions, resources.

Preliminary reports from WGs can be sent off to AQUAINT people.

 

6. Evaluation

Task:

Define criteria for success. Take annotated corpus, according to the specifications of TenseML. Run queries over the corpus taking advantage of the features and relations that are in TenseML. Results will be representative of richer, temporally based queries. This will be compared with the same sort of query (or even identical ones) over tagged corpus (the same corpus) using just named entity recognotion and secondly, TempEx tagger.

Having some way of evaluating the coverage of TenseML against new corpora, even when annotated by hand.

Look at algorithms for event inference and mock up scenarios for how TenseML can support richer temporally based questions in the context of AQUAINT.

Demonstration of how richer questions can be asked of a text corpus, involving Temporal Expressions?

Members:

Donna Harman,