TERQAS Kick-Off Meeting
January 30-31, 2002
Attendees: James Pustejovsky, José Castaño, Roser Saurí, Robert Ingria, Marc Verhagen, Drago Radev, Antonio Sanfilippo, John Frank, Beth Sundheim, Bran Boguraev, Lisa Ferro, George Wilson, Inderjeet Mani, Jean Michel Pomareda, Penny Lehtola, Mark Maybury, David Day, and Bev Nunan
Table of Contents
January 30, 2002
January 31, 2002
2. Corpus Collection/Definition
3. Specification and Definition of TenseML
4. Query Corpus Construction
5. Algorithm Review and Development
GOALS AND OBJECTIVES
Relevant issues at discussion here:
ACE (Automatic Context Extraction)
There was a long discussion about how to evaluate. It was stated that developing the gold standard will be a cyclic process. Need to agree on a subset of data sets. Need to evaluate and develop on how it relates to QA. By March or April will need specifics on how to evaluate. The government urged the group to have a long range vision. To think future!
One of the data sets that will be used in the workshop is PropBank and that should be available by mid-March.
Corpus for Analysis: Evaluation and Selection
Beth Sundheim & Lisa Ferro
TIDES as a text set for Time Expressions.
The project has focussed on time adverbials in isolation (e.g., prepositions are not taken into account, even if they are part of the Temporal Expression). Temporal relations arent marked here, either.
The current task isnt the categorization of time adverbials but assigning values to expressions. The manual annotation is thought to guide the training of the future computer annotation task.
The representation is an extension of ISSO standard on temporal markup. The basic notation is in terms of points. Among other issues, TIDES covers: duration, week values, imprecise temporal expressions (summer, etc.), truncation and other forms of imprecision.
Issues arisen from this discussion (and from the evaluation of the tagset):
Further topics that can be explored, not handled in TIDES:
Essentially, there are 2 directions to take given TIDES as starting point:
It has to be agreement on:
Certain expressions should be characterized by presenting scaled (or fuzzy) borders (e.g., century meaning approx. 100 years).
Polysemy of temporal expressions should be coded in the marking up. Prior to this, it has to be an identification of the different types of ambiguity; e.g., next Monday (either one or the other interpretation), last year (vague), fifth anniversary (event temporal expression).
It could be interesting to give a measure of the confidence of the markup as part of the notation (currently there is only a "comments" field). This would allow knowing the rate of possible error accumulation.
On the use of context in the annotation process:
Its still not clear to what level and for what particular purposes to use the context. It may be interesting to contemplate different levels of information according to their origin: just from the temporal expression, from the phrase in which it appears (e.g., including info from prepositions), from the aspectuality of the event, from an upper-sentence level, from the temporal info of the document (e.g., issuing date). In addition, the annotation should keep track of the level from which the info is obtained.
Similarly, it may be interesting to define different levels of "windows" for anaphora resolution and, as above, to keep track of it.
It would be useful to have a diary collecting problems arisen from the marking up (Beth. Lisa). Beth said she has a laundry list of items that should be incorporated into the standards. For example: "he last three summers." It's a set type annotation but it is also a duration annotation. Everyone said they would like to see Beth's laundry list.
It was asked how long does it take to train an annotator. How fast does it take to annotate an article. The workshop would like to bring in someone to do the annotating. George Wilson mentioned that Inderjeet had six grad students from Georgetown annotating documents. They had two 3-hour training sessions. The result was an F score of about .8. The articles were about 1000 words and it took about 3/4 hour to tag.
Single events and multiple events: there is a corpus from NIST. Could be useful.
Remarks and comments:
Suggestion: Pick some functions and talk about time as floating point number. For TenseML: get the most flexible tense attribute.
Human and computer are interchangeable. Question: need to determine: Whether it's a full grain or background.
We need to tag speech events. Types of events: Persistent chain of events and isolated or ambiguous events.
Descriptor attributes come in handy. What's the database key? Persistent entity.
Future work should count on a set of the queries of interest of TERQAS consumers.
In addition, its important to do some corpus analysis in order to know the relevance of what we want to handle. So, future work should be both question- and corpus-driven.
It could be useful the use of an initial filter in order to catch as many expressions as possible (max. recall) and then apply a post-refinement process.
The markup should be monotonic.
NIST Corpus: Dragomir (???)
The annotation has to be independent from the interpreter; i.e., each client should be able to extract whatever it wants and interpret data in the way it wants. This is related to:
TenseML has to be flexible and extendible for future improvements. Make a DTD more than a tagset.
The general architecture of the system has to allow both human and machine work.
Tempex has levels of heuristics. More or less guessing.
It doesnt use typing; everything is a date.
TEMPEX doesn't cover:
Question arisen here:
How far can we get with just FSA based on POS and heuristics like TEMPEX?
GOLD STANDARD CREATION
Relevant points from the presentation:
For the creation of a Gold Standard:
Issues arisen from the discussion:
TERQAS Gold Standard has to be more standard than gold.
The Gold Standard can be useful for discovering further data (thus, data-driven approach).
Gold Standard in order to get concrete results.
The strategy should be a mixed approach:
Choose corpora with different domains and temporally rich.
ACE corpus, very temporally rich.
The government suggested that for developing a gold standard for the time standard maybe we should let TIDES develop it and leverage off it. If there are other programs that are developing standards, then lets not use our resources. They would like to see more work on TenseML. We could use the TIDES or ACE annotation as a baseline but need to go further.
WORKSHOP DELIVERABLES: TenseML and TIMEBANK
Would like to see something produced that others could build from.
There is a data driven part and question driven part of the workshop. We probably need to work on both.
There is implication that the Q&A is answering by scanning the entire corpus. You would have to read every document to get the time markup.
The idea of looking at timelines because you need to look at entire corpus. The analyst is trying to come up with a story. A possibility is a gold standard of what transpired is your time line and then you could have a complete corpus.
TenseML: should be able to represent the timeline
Attributes that could be used to categorize questions:
1) Direct Question. Tensed predicate. Did John killed Jones?
2) Entailed Question Tensed predicate Is John dead?
3) Presupposed Questions
1st Order Event A killed B
2nd. Order Event. As killing B
Classic event hierarchy:
state transition process
punct. achiev. accompl.
Other relevant issues:
The question here:
What direction are we taking: from questions to semantic parsing or the other way round?
We can follow both strategies in parallel.
January 31, 2002
INFERRING TEMPORAL ORDERING IN NEWS
SUBGOALS TOWARDS EVENT TIMELINE CONSTRUCTION
1. Temporal Expression Recognition (levels of analysis)
Increase the power of TEMPEX:
[Quantificational force over any of the properties below]
Particular culture- and organization-related Time Expressions
Syntactic and morphological triggers (era, pre-, post-, )
coercion (e.g., on proper names: Vietnam)
uncertainty of how to interpret denotation of a Temporal Expression
2. Anchoring the Temporal Expression to an event
a. Tensed Predicates.
b. Nominal Expressions (last week bombing)
c. Event Composition. (last week bombing was planned 3 months ago)
Need to review existing tools that perform anchoring of TE to events (Inderjeet, Kamp)
3. Temporal Ordering.
Classification of events that involve change of states or are relevant in terms of aspectuality: prevent, precede, start, cause,
4. Construction of the Query Sets.
With the participation of the costumer.
5. Building corpora
Some consultant? e.g., Patrick Hanks
100 Articles from Reuters: Single language, no reported speech.
100 Articles from ACE (Newswire, Broadcast, newspaper)
100 Articles from DUC (NIST) (Newswire)
100 Articles from PropBank (newspapers).
100 Articles TIDES?
1. Ontology (Antonio Sanfilippo, Chair)
Examine resources wrt encoding of temporal, event class, and aspectual information. Comparative evaluation of ontologies for feature encoding.
Rob Gaizauskas, Graham Katz, Cleo C, James Pustejovsky, Inderjeet Mani, Roser Saurí
IEEE Upper Ontology
ISI Upper Model Netchez Nigel
WordNet and EuroWordNet
LingoMotors Type System
KIF Stanford Ontology
Lotus Discover Server
Obrst, Leo MITRE
Guarino, Nicola Ontology Page
2. Corpus Collection/Definition (David Day, Temp Chair)
Get permission for use. Create a common representation. Convert the corpora to this common form. Create tools for retrieval and preprocessing of the corpora. Determine justification and characterization of the features of each corpus, relative to our goals.
Each corpus is approximately 100 articles for annotation purposes. Determine who will annotate the text, when initial specs are frozen. Interannotation agreement needs to be measured. Extend and modify Alembic Workbench to allow for annotation of Corpus according to appropriate tag set. This might directly involve modifying TempEx so as to allow for more robust Annotation Suggestions during corpus markup.
Drago Radev, Beth Sundheim, Marc Verhagen, Inderjeet Mani, MITRE person, Lisa Ferro.
Modes of use for corpora:
b. Research and discovery
3. Specification and Definition of TenseML (James Pustejovsky, Chair)
Understanding the various knowledge representations that will be included in different components of the language. Follow up the question set, frame the initial set of requirements from this set. Define the tag set so that the expressions can be extensible.
Create a language not simply a markup. Explore utility of RDF for encoding relations.
John Frank, Bob Ingria, Graham Katz, Inderjeet Mani (consultant), Jean-Michel Pomareda (consultant), Beth Sundheim, Bran Boguraev (consultant), Antonio Sanfilippo (consultant), Jonathan Rees (consultant).
4. Query Corpus Construction (Drago Radev, Chair)
Examine existing taxonomies of queries and existing question sets, Encarta log files. Create a typology of questions relating to temporal queries. Look at Teknowledge set of queries (Jean-Michel). Construct at least three different sets of question classes/sets for development and testing. Some questions have answers in the corpus, but not for all queries. Defining the constraints on the query language, integrating this with the specification of TenseML feature and functionality.
Jean-Michel Pomareda, Lisa Ferro, Marc Verhagen (consultant, contacting David Elworthy), TREC people, AQUAINT executive committee suggestions.
5. Algorithm Review and Development (José Castaño, Chair)
Define and scope the range of algorithms
(a) Generate TenseML representations from text:
i. Extensions necessary to TempEx. (covering the functionality of TempEx)
ii. Specification of as many of the requirements from the Tempex-extensions as possible (relating to TenseML requirements)
iii. Explore the use of various technologies/algorithms on the features in the specification for TenseML, and find where the Wall of Analysis is. And where is non-local, and compositional analysis.
(b) Using the representation as an application (i.e., to answer a question). Create a report cookbook of how TenseML can be consumed in a specific kind of application in order to do a certain kind of inference. Look at algorithms for event inference and mock up scenarios for how TenseML can support richer temporally based questions in the context of AQUAINT. (This would be the basis for overall evaluation of the workshop)
Bran Boguraev, John Frank, José Castaño, Bob Ingria, Marc Verhagen, George Wilson (consultant), Antonio Sanfilippo (consultant),
Communication with AQUAINT PIs, to discuss their problems, approach, issues, solutions, resources.
Preliminary reports from WGs can be sent off to AQUAINT people.
Define criteria for success. Take annotated corpus, according to the specifications of TenseML. Run queries over the corpus taking advantage of the features and relations that are in TenseML. Results will be representative of richer, temporally based queries. This will be compared with the same sort of query (or even identical ones) over tagged corpus (the same corpus) using just named entity recognotion and secondly, TempEx tagger.
Having some way of evaluating the coverage of TenseML against new corpora, even when annotated by hand.
Look at algorithms for event inference and mock up scenarios for how TenseML can support richer temporally based questions in the context of AQUAINT.
Demonstration of how richer questions can be asked of a text corpus, involving Temporal Expressions?