AWB Logo

The Alembic Workbench README

For Version 2.5 (beta), July 25, 1997. Copyright (c) 1996, 1997, The MITRE Corporation. See COPYRIGHT.tcl for details on license agreement.

This file accompanies a distribution of the Alembic Workbench corpus development toolset. This software is copyrighted with a general public license agreement (see the end of this file or the separate file AWB-COPYRIGHT, also in this directory).

[NOTE: In this file, and in the file INSTALL, $AWB_REL_DIR refers to the directory in which this file is found when it comes out of the tar file, and $AWB refers to the directory directly above this directory (i.e., $AWB_REL_DIR/../). Usually, the $AWB_REL_DIR directory has a name of the form awb--. ]

This version of the Alembic Workbench, 2.4 for Solaris 5.4, has only been tested on systems with the following configurations:

  • Sun SparcStation running Solaris 2.4, Solaris 2.5
  • Sun Ultra running Solaris 2.5.1
  • Sun SparcStation running SUN OS 4.1.3 (not all functionality is available on this OS)
  • This distribution comes with a pre-compiled version of SPAM, an SGML normalizer. For more information on this software, visit URL http://www.jclark.com/sp/index.htm and notice its accompanying copyright information in SPAM-COPYRIGHT.

    This distribution comes with a pre-compiled version of Tcl/Tk. To install a general purpose version of Tcl/Tk for your site, visit URL http://www.smli.com/research/tcl/ or look in the directory $AWB_REL_DIR/TclTk of this distribution of Alembic Workbench to see if a gzipped tar file of Tcl and Tk has been included. Notice its accompanying copyright information in TCL-TK-COPYRIGHT.

    To install, follow the installation instructions contained in the file $AWB_REL_DIR/install/INSTALL (listed below).

    To see a list of features for this version and previous versions, see the file $AWB_REL_DIR/NEW-FEATURES.TXT.

    A relatively complete manual with example screen dumps can be viewed by pointing your web browser at the file $AWB_REL_DIR/manual/index.html

    If you have any questions or comments, please contact:

        +-------------------------------------------------------------------+
        | Dr. David S. Day       WWW: http://www.mitre.org/                 |
        | MS K329                MII: http://www-crcf.mitre.org/~day/       |
        | The MITRE Corporation  Intelligent Information Access Sec. (G4H)  |
        | 202 Burlington Road    Artificial Intelligence Center, AISC       |
        | Bedford MA  01730 USA  Center for Integrated Intelligence Systems |
        | Phone: (617) 271-2854  Fax: (617) 271-2352  Email: day@mitre.org  |
        +-------------------------------------------------------------------+
        |    Complete contact list:                                         |
        |    David Day         (day@mitre.org)      (617) 271-2854          |
        |    John Aberdeen     (aberdeen@mitre.org) (617) 271-2840          |
        |    Patty Robinson    (par@mitre.org)      (617) 271-8414          |
        |    Lynette Hirschman (lhirschm@mitre.org) (617) 271-7789          |
        +-------------------------------------------------------------------+
    

    COPYRIGHT Information

    -------------------------------------------------------------------

    Except as permitted below ALL RIGHTS RESERVED

    SOFTWARE LICENSE

    The MITRE Corporation (MITRE) provides this software to you without charge to use for your internal purposes only. Any copy you make for such purposes is authorized provided you reproduce MITRE's copyright designation and this License in any such copy. You may not give or sell this software to any other party without the prior written permission of the MITRE Corporation

    This software is the copyright work of MITRE. No ownership or other proprietary interest in this software is granted you other than what is granted in this license.

    Any modification or enhancement of this software must inherit this license, including its warranty disclaimers. You hereby agree to provide to MITRE, at no charge, a copy of any such modification or enhancement without limitation.

    MITRE IS PROVIDING THE PRODUCT "AS IS" AND MAKES NO WARRANTY, EXPRESS OR IMPLIED, AS TO THE ACCURACY, CAPABILITY, EFFICIENCY, MERCHANTABILITY, OR FUNCTIONING OF THIS SOFTWARE AND DOCUMENTATION. IN NO EVENT WILL MITRE BE LIABLE FOR ANY GENERAL, CONSEQUENTIAL, INDIRECT, INCIDENTAL, EXEMPLARY OR SPECIAL DAMAGES, EVEN IF MITRE HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

    You accept this software on the condition that you indemnify and hold harmless MITRE, its Board of Trustees, officers, agents, and employees, from any and all liability or damages to third parties, including attorneys' fees, court costs, and other related costs and expenses, arising out of your use of this software irrespective of the cause of said liability.

    NOTE: When exporting this product outside the United States of America, the U. S. Commerce Department has determined that no license is required (NLR).

    The Alembic Workbench INSTALL file contents

    Outline of the installation procedure

  • (0) Create a directory to store this and future versions of the Alembic Workbench (if this hasn't been done before). This is the Workbench "top level directory." Place the gzipped distribution of Alembic Workbench in this directory.
  • (1) Change directory to this top level AWB directory (with "cd").
  • (2) Ungzip and untar the distribution file (or, if you have a tape version of the distribution, untar it from tape).
  • (3) Set two environment variables, one (AWB) being the path of the "top level" Alembic Workbench directory, and the second (AWB_REL) being the version number for the version being installed.
  • (4) Run the installation script from this directory.
  • (5) Create (or, if this is an update of a previous Workbench installation, remove and re-create) a link from $AWB/awb.cshrc to $AWB/awb-/awb.cshrc, created in step 1. All users who wish to use the Alembic Workbench should execute the $AWB/awb.cshrc script to define the various environment variables used by the Workbench.
  • (6) Run the pre-processing installation script (if applicable).
  • (7) Optional: Install Alembic Pre-processing scripts and POS lexicon.
  • (8) Optional: Comment on disk space usage.
  • To install the Alembic Workbench (AWB)

  • (0) Prior to un-tarring the contents of a distribution, one should create a directory in which to place the Alembic Workbench code and its declarative files. We will refer to this directory as the "top level" Workbench directory, or AWB. After this is created, place the gzipped tar distribution file within it. This file will have one of the following names: awb-2.2.tar.gz, awb-2.2-nopp.tar.gz or awb-2.2-nolisp.tar.gz. If you are installing from a tape, the file is in tar format on the tape, so you should proceed to the next step.
  • (1) Change directory to the "top level" directory (AWB), via:
             unix> cd 
    
    replacing the reference with the actual path on your system.
  • (2) (a) If you are installing from a tape, then untar the contents of the tape, using the following command:
             unix> tar -xvf 
    
    (b) If you are installing from a gzipped, tar distribution file, then first ungzip the file, as follows:
             unix> gunzip .tar.gz
    
    or, more efficiently (since there is not intermediate .tar file created:
             unix> gunzip --stdout .tar.gz | tar -xvf -
    
    where will be something like one of the following: awb-1.20, awb-1.20-nopp or awb-1.20-nolisp, depending upon whether you have a complete distribution (including all of Alembic and its preprocessing scripts), a distribution without Alembic, or a partial Alembic Workbench distribution that excludes a lisp binary image used by some of the Workbench utilities, respectively. Then, untar the ungzipped file as follows:
             unix> tar -xvf .tar
    
    where is the same as above. This will result in the creation of a subdirectory awb-1.20 in the top level Workbench directory (AWB). This subdirectory is referred to as the AWB_REL_DIR in this and other documentation. (3) Set two environment variables as follows:
             unix> setenv AWB 
    
    This directory specification should not include the final, trailing slash. The "top level" directory is the one described in steps 0 and 1. Then set the other environment variable:
             unix> setenv AWB_REL 
    
    This environment variable should be set to the version currently being installed. For example, if you are installing version 2.2, then you would say "setenv AWB_REL 2.2". This version number should *not* include the "awb-" prefix. (4) Having defined the environment variables above, then proceed to execute the installation script, as follows:
             unix> $AWB/awb-$AWB_REL/install/install-awb
    
  • (a) The first step of the script is to define the other main environment variable, AWB_REL_DIR.
  • (b) The next task performed by this script is to define the installation- and version-specific awb.cshrc file. (It is this file that should be executed by anyone wishing to use the Alembic Workbench. See below for more details. This script defines a number of other version- and OS-dependent environment variables.)
  • (c) The script will then proceed to transfer a number of directories and their contents from $AWB_REL_DIR to $AWB. These directories contain data and declaration files that can be edited and otherwise modifed by users. All the files and directories that remain under $AWB_REL_DIR will be set to read-only, to protect them from inadvertent modification, whereas virtually all the data and declaration files will be world read/write accessible. (Both of these can be controlled by the site installer, of course.) The script will not write over any files that already exist in those version-independent directories (from previous installations).
  • (d) The final step of this script is the execution of the version- and site-specific setup script $AWB_REL_DIR/awb.cshrc. This script defines a number of environment variables in the current shell environment, such as $AWB, $AWB_REL_DIR, $TCL_LIBRARY, etc. This script should be included in the default .cshrc files of those people who are likely to want to use the Workbench. The installer should look at this script carefully. Notice that if the user is already using Tcl/Tk, it redefines the environment variables TCL_LIBRARY and TK_LIBRARY. If these libraries are not consistent with the installed version of Tcl/Tk, *and* if users are actively using Tcl/Tk for other systems, the installer may wish to modify this script so that the Alembic Workbench uses the site-installed version of Tcl/Tk.
  • (5) At the conclusion of these steps a new version of the Alembic Workbench has been installed. After testing that this is the case, and that it works appropriately, the installer should then change directory (cd) to $AWB and define a link as follows:
              unix> ln -s $AWB_REL_DIR/awb.cshrc awb.cshrc
    

    If such a link already exists, pointing to an earlier version, the installer should remove this link prior to executing the above. The "sourcing" of this script should be probably be placed in the .csrhc files of all who intend to use the Alembic Workbench (see below).

    Before running the Workbench, the user needs to explicitly "source" this cshrc script as follows:

             unix> source $AWB/awb.cshrc
    

    To save space, the installer will probably want either to remove the whole sub-directory used by the earlier version, or tar and gzip it (in case it is needed because of problems in the distribution files and/or tape).

    NOTE to installer: The present version of Alembic Workbench has been built and tested using Tcl 7.4 and Tk 4.0. These versions of Tcl/Tk have been delivered along with the Workbench, and by default are used in running the Workbench. If you wish to integrate the Workbench with an installed ver- sion of Tcl/Tk that is compatible, *and* you have installed a version of the Workbench that includes the .tcl source code (and not a "p-compiled" version in which only .ptcl files are available in the gui subdirectory) you may do so by setting the following environment variables appropriately:

                 AWB_TCL_DIR  Should point to directory in     
                              which the tclsh and wish         
                              binaries can be found.           
                 TK_LIBRARY   Should point to appropriate      
                 TCL_LIBRARY  libraries for Tk and Tcl.        
    

    If the versions are compatible and the Workbench runs without any problems, you could save some space by deleting the subdirectories below. (But don't don't delete them until confirming that the default Tcl/Tk installation is indeed compatible with Alembic Workbench.)

                 $AWB_REL_DIR/TclTk-4.1.3 and                  
                 $AWB_REL_DIR/TclTk-5.4                        
    
  • (6) This distribution may include the Alembic natural language processing system (consisting of a transformational rule-based approach to phrase parsing) and its associated preprocessing programs (for tokenizing, sentence tagging, part-of-speech tagging, date tagging, etc.). For these systems to be available to the user (via the Text Processing option under the Utilities pulldown menu), you need to install them. Note, however, that the Workbench is fully functional as a manual tagging environment *without* the Alembic NLP system being installed. (Some distributions may not include Alembic and its preprocessing scripts at all.) To install the preprocessing programs, run the following script:
             unix> $AWB_REL_DIR/install/install-preprocess
    
    This script can take a long time, depending on the machine, the fragmentation on the disk, etc., since it builds a fairly large lexicon database. The approximate amount of disk space that is required for this is 60 MB. Besides gunzipping the tar file, most of the time consumed involves building the lexicon database. If you experience any problems with this installation, please contact John Aberdeen (see contact information below). More detailed information on the part-of-speech tagger can be found within the $AWB/preprocess sub-directories, and is available upon request from John Aberdeen.

    IMPORTANT: Until we find the nature of the incompatibility, there are two perl scripts used in running various pre- and post-processing scripts that must use perl4 and *not* perl5 (notwithstanding the supposed backward compatibility of perl5!). Therefore, the installer should edit these scripts to have them point to the local installation of perl4. These scripts are:

               $AWB_PP_SCRIPTS/prelembic
               $AWB_TOOLS_DIR/copyptf        
               $AWB_TOOLS_DIR/fast-ne-score  
               $AWB_TOOLS_DIR/index2html     
               $AWB_TOOLS_DIR/ne-index       
               $AWB_TOOLS_DIR/normatt        
               $AWB_TOOLS_DIR/print-phrases  
               $AWB_TOOLS_DIR/sgm2ptb        
               $AWB_TOOLS_DIR/strip-tags     
    
  • (7) Using the workbench.

    Any user wishing to use the Workbench should first source the file $AWB/awb.cshrc. The repeat user of the workbench will probably want to add source-ing this script to his or her own .cshrc file. This file is actually a link (defined above) to the current version of a file that defines all the environment and path variables needed for executing the Workbench. Thereafter, the user can invoke the Workbench in one of two ways:

                unix> alembic-workbench
    
    or simply:
                unix> awb
    
  • (8) Other installation notes.

    NOTE ON SAVING DISK SPACE: The current distribution supports running the Alembic Workbench on both SunOS 4.1.3 machines as well as Solaris 2.4 and 2.5.x machines. If you plan on running the Workbench on only one of these OS's, then there are some files you may tar and gzip (or simply remove) to save space.

    OS-specific directory pairs, one of which can be tarred and gzipped if you do not need to use the workbench on that platform:

               $AWB_REL_DIR/TclTk-4.1.3
          or   $AWB_REL_DIR/TclTk-5.4
    

    OS-specific file and directory pairs, one of which can be gzipped if you do not need to use the workbench on that platform:

               $AWB_REL_DIR/lisp/awb.4.1.3.image (a link) and
               $AWB_REL_DIR/lisp/sunos4/...
    
          or   $AWB_REL_DIR/lisp/awb.5.4.image (a link) and
               $AWB_REL_DIR/lisp/solaris/...
    

    (Some installations may not even include 4.1.3, since we have not yet obtained a runtime license for this version of Allegro CL.) Of course, you will also probably want to remove from disk (but save on tape) the original .tar.gz files from which this version of the Alembic Workbench was derived. Note that the preprocess.tar.gz file from which the preprocessing scripts are built remains in the directory $AWB_REL_DIR/, so after successfully installing those scripts, the user may want to remove from disk this .tar.gz file as well. If you choose not to use the Alembic NLP system at all, then all of the preprocess subdirectory and the preprocess.tar.gz file may be removed, leaving approximately 20 MB used by all of the Alembic Workbench proper.

    The preprocess include lexica and rules files for both mixed case text and all-UPCASE text. If the user wishes to keep the preprocess but does not want to process all-UPCASE text, the following files may be removed:

    	$PP_SCRIPTS_DIR/pos/lexicon-rules-utilities/CONTEXTUALRULES.upcase
    	$PP_SCRIPTS_DIR/pos/lexicon-rules-utilities/LEXICALRULES.upcase
    	$PP_SCRIPTS_DIR/pos/lexicon-rules-utilities/Makelexica-upcase.csh
    	$PP_SCRIPTS_DIR/pos/lexicon-rules-utilities/upcase-replibrill-lex* (several files)
    	$PP_SCRIPTS_DIR/pos/lexicon-rules-utilities/numeric-upcase-replibrill-lex* (several files)
    	$PP_SCRIPTS_DIR/pos/lexicon-rules-utilities/upcase-replibrill.train.scrubbed+quotes
    

    NOTE ON 4.1.3 INSTALLATIONS: We currently do not have a license that allows us to distribute 4.1.3 runtime Franz Allegro Common Lisp images. This disables a number of the Workbench utilities (found under the Utilities menu). If your site has its own license for Allegro CL 4.2 or 4.3 on 4.1.3 machines, then contact us and we will probably be able to provide you with a version of the image that can run under 4.1.3 using your installed Allegro CL.

    NOTE ON MANUAL: A very useful manual has been included in this distribution in the directory $AWB_REL_DIR/manual/. These files are HTML-encoded hyper-text. If you point your web browser at $AWB_REL_DIR/manual/index.html, all of the subsequent hyper links are self contained (that is, within this same directory).

  • Please feel free to contact any of the following people if you have problems or questions regarding the Alembic Workbench.

  • Dr. David Day (617) 271-2854 day@mitre.org
  • Ms. Patty Robinson (617) 271-2849 par@mitre.org
  • Mr. John Aberdeen (617) 271-2840 aberdeen@mitre.org
  • Dr. Lynette Hirschman (617) 271-7789 lynette@mitre.org
  • U.S. Mail for any of the above:

                           M.S. K329
                           The MITRE Corporation
                           202 Burlington Road
                           Bedford, MA  01730
                           U. S. A.
    


    NEXT: 4. Starting and Quitting the Alembic Workbench

    Return to Alembic Workbench User's Guide Table of Contents