AWB Logo

Alembic Workbench User's Guide

    8. Normalization and Preprocessing

Why Normalize SGML-encoded files?

When an SGML-encoded file is opened in the Alembic Workbench, it may have to be be normalized. The user may opt to normalize the document from within the interface or alternately, outside the interface using SPAM. A document should be normalized when it contains complex markup, e.g., a <p> tag that has an implied, rather than an explicit </p> end tag. If the file does not contain SGML markup, the file need not be normalized. The normalization routine uses a Document Type Definition (DTD) to confirm that all SGML annotations are of valid SGML format. When the normalization routine encounters anomalies, it warns the user. For example, if there are overlapping <p> and </p> tags, an error message is launched. In addition, some DTD's correct for certain deficiencies in the SGML markup, e.g., closing start <p> tags by inserting missing </p> tags. The DTD is defined according to the markup task and is specified by the user when the SGML file is opened for the first time.

Why Preprocess SGML-encoded and raw text files?

When the preprocessing dialog is launched, the following preprocessing options are available:
  1. Remove <control-M> characters?
  2. Translate all '&' chars to '&#38;'?
If the file to be loaded contains <control-M>'s (carriage-return/line feed), the first option will delete them.

If the file to be loaded contains ampersand data characters, the second option will convert them to an SGML-encoded representation.

How to Normalize and Preprocess SGML-encoded files

Click the Normalize with DTD radio button to the On position.

Click on the Select (DTD) button.

Select the appropriate DTD by navigating through the directory listing or by typing in the pathname of the DTD file. Press OK.

The pathname of the specified DTD will be displayed to the right of the Select (DTD) button.

Notice that the document's top level element is automatically shown in the Top Level Element box. The Workbench defines the top level element as the first start tag in the SGML-encoded document. If this is incorrect, replace it with the correct one.

Choose the appropriate Preprocessing options.

Press OK.

If errors are encountered, a scrollable error dialog is launched. Inspect the errors and press OK. This will soon change!

A confirmation dialog will be launched. To continue loading the file, press Yes. To quit loading the file, press No.

NEXT: 9. Tagging Files for Entities

Return to 5.12 Opening SGML-encoded Files

Return to Alembic Workbench User's Guide Table of Contents