AWB Logo

Alembic Workbench User's Guide

5.12. Opening SGML-encoded Files

To open an SGML-encoded file in a language other than Latin-1, click here.

To open an SGML-encoded file in a Latin-1 language:

Note: Documents that contain complex or implied markup should be normalized. All other documents, either containing no or simple SGML annotations, need not be normalized. Outside of the Alembic Workbench interface, use SPAM or Emacs SGML-mode to validate or normalize SGML markup.

Under the File menu, choose Open Document (Latin-1).

Specify a file to be opened by typing in the pathname of the file or select with the mouse a selection from the directory listing. Press OK.

If the file is being opened in the Alembic Workbench for the first time, a preprocessing dialog from which preprocessing and normalizations options are chosen is launched. Click here for more information on preprocessing and normalization.

If the file has previously been opened in the Alembic Workbench, the preprocessing dialog will not be launched. Instead, files created when the document was first opened in the Workbench are consulted. These files are maintained in accordance with an internal format called the Parallel Tag File Format (PTF). Along with preserving a copy of the original file for archival purposes, this format saves annotations in separate tag and text files in an invisible .ptf subdirectory. If a file has been edited outside of the interface since it was last opened, changes are detected. In response, the Workbench will query the user as to whether to overwrite previous versions of the ptf files. If the user selects Ok, the invisible files are updated to be consistent with the current state of the file being opened.

Choose the appropriate pre-processing and normalization options. If normalization is being performed, notice that the correct DTD is listed. Press OK.

A scrollable error dialog will be launched if the normalization routine encounters any non-compliant SGML. The user should inspect the message to decide whether the problem will need to be corrected off-line. To ignore the message and continue opening the file, Press the OK button in the error dialog. Press Yes in the confirmation dialog to continue loading the file. Press No to quit loading the file.

A tagset selection dialog will be launched. It will list the tagsets that were found in the file being opened. By default, all annotations corresponding to the tagsets will be displayed in the file once it is loaded. Note that by turning off a tagset, its annotations will not be displayed in the interface. In addition, when a tagset is excluded, its annotations are not available to the Alembic Workbench to warn the user of overlapping tags. Therefore, to avoid the potential for producing invalid SGML, it is advisable to display all tagsets when loading a file. To expedite the loading of richly annotated files, such as part-of-speech tagged documents, the developers recommend apportioning the data into smaller files, rather than choosing to ignore tagsets.

Turn off any tagsets to be excluded. Press OK.

The watch cursor will be launched until the file is finished loading.

NEXT: 5.13 Closing Files

Return to 5.11 Opening Raw Text Files

Return to 5.1 The File Menu

Return to Alembic Workbench User's Guide Table of Contents