29 Tagging Tools

Tagging Tools Tutorial

Step 1: Creating a New Tagging Project

Step 2: Editing the Morphology Text

Step 3: Compiling the Text

 

Tagging Tools Reference

The New Tagging Project Window

The Main Window

Manually Editing the Tags

The Options Window

The Pretag Window

The Search and Replace Window

The BWT Extraction Window

How Versions are Tagged

 

 The Tagging Tools Window provides a set of tools for facilitating the morphological tagging of Greek New Testament texts. It was originally developed to help with tagging Greek manuscript transcriptions but can be used to tag any Greek New Testament text.

 

The Tagging Window is divided into two panes. The upper pane shows the version(s) currently being tagged. The bottom pane shows a configurable list of other Greek texts which are provided for comparison purposes. The program can be configured to highlight words which vary between the upper and lower panes and between the multiple versions being tagged. The lower pane versions are read only. The text in the upper pane can be edited in place simply by clicking on a morphological code or a lemma. Text forms cannot be edited.

 

The BibleWorks 9 distribution contains tagging projects already configured for each of the new manuscripts. But except for Sinaiticus, which is finished, these are just samples and should not be considered finished projects.

 

Tagging Tools Tutorial

 

The best way to introduce you to the Tagging Tools package is to walk you through the process of tagging a new Greek New Testament text.

 

Step 1: Creating a New Tagging Project

 

To create a new tagging project open the Tagging Tools Window by going to Tools | Language Tools | Morphological Tagging Tools from the main window menu, or by going to Tools | Tagging Tools from the Mss Tab. When you do that you will see a window like the one above. If you then go to File | New Project you will see a window like the one to the right.

 

BibleWorks ships with a sample version file for use in this tutorial. It is called sa1.txt and contains the Greek New Testament in CCAT format. It is actually the Tregelles text exported and renamed.

 

A. Specifying the Input File

 

In the New Tagging Project window enter "sa1" for both the Version ID and the Project Name. The new version will appear as SA1 in the BibleWorks Browse window and the tagging project will be stored in the SA1 subfolder in the tags folder in BibleWorks.

 

Click on the button to the right of the first version files box and navigate to the sa1.txt file in the BibleWorks tag/projects folder.

 

Leave the second version files box empty. The provided sample file has no morphology. See the reference section below for details on the kinds of files that you can use for input.

 

B. Specifying the Input Format

 

The sa1.txt file is in CCAT format so check the CCAT check box.

 

C. Creating the Project

Click on Create to create the project.

 

Step 2: Editing the Morphology Text

You should now see a Tagging Tools Window like the one to the right.

 

Notice that the lemma and morphology slots under each word have dashes because we did not provide this information. You can fill these slots in manually or use the provided tools to make educated guess at what should be in the slots. This does not remove the requirement of looking at each word in context yourself but it can minimize the amount of labor involved. BibleWorks can look at other tagged Greek New Testament texts and use the information to insert preliminary information in the lemma and morphology code slots.

 

You should use caution in what you do with databases produced by this procedure. The results still need to be edited word by word and some of the databases used to guess the values may be copyrighted. So the original copyright holders may still have a copyright interest in the resulting database.If you want to distribute versions produced by this procedure please contact BibleWorks and the copyright holders of the databases used.

 

To pretag the new text version click on Utility | Pretag texts option in the Tagging Tools Window main menu. You should then see a window like the one to the right.

 

 We will start with the very simplest option and leave it to you to experiment later with more complex methods.

 

Select "sa1.txt" in the list of manuscripts to tag on the left. Then select "BNT" in the base text list. When you click on the button BibleWorks will compare your text to the BNT text and add lemma and morphology tags accordingly.

 

Not all words can be tagged but this will provide you with a starting point. Depending on options used some of the morphology codes may have appended codes to indicate the nature of uncertainty about the guess (these are the tags on the right hand side).

 

At this point there are a lot of things you can do to improve the text. Retagging with other options set might help. Examining more base texts might help. You can also click on Utility | Search and Replace to make global changes in lemmas and codes. See below for details on how to do this. After you have done these things it is a matter of hand editing the lemmas and codes in context.

 

Step 3: Compiling the text

 

When you have the text they way you want it you can compile it so that it can be searched in BibleWorks. By default BibleWorks only compiles a text version. If you want to compile accented version and/or morphology versions, click on File | Options and check the "Build morphology versions" and/or "Build accented versions" check boxes.

For this tutorial just select the option to build morphology versions. Also enter "My Test Version" in the Menu Name box. Now click on the button. The version will be compiled and added to your Browse Window display list.

 

Note that the morphology version was built with the same version name as the text version but with a "-M" added. The name has also been changed to upper case. Now you can close the Morphological Tagging Tools Window.

 

Tagging Tools Reference

 

The Main Window

 

The Tagging Window is divided into two panes. The upper pane shows the version(s) currently being tagged. The bottom pane shows a configurable list of other Greek texts which are provided for comparison purposes. The program can be configured to highlight words which vary between the upper and lower panes and between the multiple versions being tagged. The lower pane versions are read only. The text in the upper pane can be edited in place simply by clicking on a morphological code or a lemma. Text forms cannot be edited.

 

You can drag the lemma and code information from one entry to another. To do this hold down the shift key and click anywhere in the source entry. It doesn't matter whether you click on the form, lemma or code. Then drag the mouse cursor to the destination form. If you "drop" the data on a code. Only the code will be copied. If you drop it on a lemma, only the lemma will be copied. If you drop it on a form, both lemma and code will be copied. The entry will turn gray to indicate that it has been changed.

 

The main window has a number of controls and buttons which have the following functionality:

 

  This text input box is where you enter the verse that you want to work on. You must use the standard BibleWorks three letter abbreviations for book names. To change the verse type the reference and press the <enter> key.

 

This spin button moves you quickly to the next or previous verse.

This button synchronizes the BibleWorks Browse Window with the current Tagging Tools verse.

This button synchronizes the Tagging Tools verse with the current Browse Window verse.

This option will compile or recompile the tagging version with the current settings.

This option will open the Search and Replace Window.

This option will close the Tagging Tools Window.

 

The Main Menu

 

File | Open Project

This option prompts for a new tagging project to open. Tagging projects are identified by the existence of a .tpj file in the BibleWorks "tag" folder. Each project has a TPJ file which was saved when the project was created.

File | New Project

This option prompts for input parameters for creating a new tagging project. See below for details.

File | Save Project

This option saves the current project including all edits that you have made in the tagging.

File | Save & Backup Project

This option saves the current project including all edits that you have made in the tagging. It also creates a backup of the .RAM files for the project. The backup is saved in a folder called "backup" in your project folder.

File | Compile

This option will compile or recompile the tagging version with the current settings. The type of compilation done is determined by settings in the "options" window (see the next item). The compiled version is also installed and activated for display in the Browse Window.

File | Options

This option opens the Tagging Tools Options Window. See below for details.

Utility | Extract TXT from BWT

This option prompts you for the location of a transcription file and extracts the transcription into text files. The files with have the RAT and RAN extensions. The RAT files are version text files in BibleWorks format. The RAN files have the notes attached to verses in the transcription file.

Utility | Pretag Texts

This option opens a window with options that allow you to pretag version texts based on comparison with existing tagged texts. See below for details.

Utility | Search and Replace

This option opens a window that provides tools to do global search and replace operations on the text currently being tagged. See below for details.

 

Manually Editing the Tags

 

The items in the upper window are editable. The lemma and morphological code for each entry can be changed simply by clicking on a lemma or code and editing the text. Items that have been manually edited will change color to gray.
 
The highlighted forms are forms that analysis indicates vary between manuscript variants and the base texts in the bottom window. The texts in both windows are compared using a longest common substring algorithm. This feature can be turned off in the File | Options menu.
 
You can click and drag data from the bottom window to an item in the top window, or from item to item in the top window. To do this hold down the SHIFT key with the mouse cursor anywhere in the item to be copied. Then with the SHFT key down drag the item to the target item. If you drop the source item on to a form, both lemma and code will be transferred. If you drop it on a lemma OR a code, only the lemma or code will be transferred.
 
If you hold down the CTRL key and left click on an item the gray background indicating an edited item will be toggled on and off.
 
If you hold down the ALT key and left click on an item the addition codes (+…) will be removed for the item. This would be done if the automatic tags are correct. Holding down the SHIFT+CTRL keys and left clicking will do the same thing.

The New Tagging Project Window

To create a new tagging project open the Tagging Tools Window by going to Tools | Language Tools | Morphological Tagging Tools from the main window menu, or by going to Tools | Tagging Tools from the Mss Tab. When you do that you will see a window like the one to the right. You need to enter the following information:

 

Version ID


The version ID is the name that will be assigned to the compiled version in BibleWorks. For a single Greek version choose a short descriptive abbreviation. For manuscript project versions you should use a version ID composed of "m-" plus the Gregory-Aland number plus the corrector level. You can have several tagging projects with the same project name and different version IDs. For example, the Sinaiticus project has m-01 as the project name and m-01a, m-01b, m-01c and m-01d as version IDs. But for non-manuscript projects the version ID and project name should be the same.


Project Name


The project name is the name of the folder where the project files will be stored. Tagging projects are stored in "tag" folder in the BibleWorks directory. In most cases you would want to use the same name for the project and the version ID (see the previous paragraph).


Input Files

 

A new Tagging Project can be started in three different ways, corresponding to the three radio buttons in the New Tagging Project window. The three ways are as follows:

In these two text boxes you specify the files which contain the data files for the version. All files should be in BibleWorks format, i.e. with one verse per line and the verse reference at the beginning of each verse. There are three types of files that can be used with the tagging module:

 

Type 1: Files with text only in this format: book chapter: verse word1 word2 ...

Type 2: Files with morphology only in this format: book chapter: verse lemma1@code1 lemma2@code2 ...

Type 3: Files with morphology and text in one file in this format: book chapter: verse word1 lemma1@code1 word2 lemma2@code2 ...

 

The options for providing input using the text version and morphology version input boxes are these:

 

Option 1: Enter a type 3 file name (with full path) in the first box. Since the file already contains morphology information enter the same name in the second box. This tells the program that all the information is in a single file.

 

Option 2: Enter a type 1 file name in the first box and a type 2 file name in the second box. The project will be initialized with the text and morphology information in the combined files.

 

Option 3: Enter a type 1 file name (with full path) in the first box and nothing in the second box. This option will cause BibleWorks to construct a dummy morphology text with -@- for each word. Use this option if you have no morphology information to start with.

 

Input Format - Normally text input files should be in BibleWorks Greek font format. If this box is checked you can provide input in CCAT format and BibleWorks will convert the text on input.

You can also create a new tagging project from an existing installed version. All you have to do is specify the version. This is equivalent to exporting a Greek text from BibleWorks and using the previous method.

You can also start a new tagging project from a transcription file. This option will convert the transcription information in a BWT file to a text file that can be used for input to the tagging tools. All you have to do is enter the path of an existing BWT file or browse to the location using the browse button to the right of the text box. Normally you will want the Version ID of your tagging project to be the same as the name of your BWT file (without the extension).

 

The Options Window

 

The Tagging Tools Options window allows you to set a number of options that determine how various features work.

 

Comparison Versions

This list box contains a list of installed Greek New Testament versions. The items that are selected will appear in the lower pane of the Tagging Tools Window to provide texts which you can compare with the version which you are tagging. Differences between your tagged text and these comparison versions will be highlighted to give you an indication of words that need to be checked.

 

Tag Colors

In the Search and Replace window (see below) you can do searches on your tagged version and have the results highlighted with different colors. The Tag Colors section allows you to set the colors used in the Search and Replace section.

 

Receive verse updates from main window

If this option is checked the verse displayed in the Tagging Window will be updated automatically when the verse in the main Browse Window changes.

Send verse changes to main window

If this option is checked the verse displayed in the main Browse Window will be updated automatically when the verse in the Tagging Window changes.

Show differences with base text

If this option is checked the differences between your tagged text and the comparison versions in the lower pane will be highlighted.

Build accented versions

If this option is checked, when you compile your tagged text, accented versions will be built. This is necessary if you want to do accent-sensitive searches on the text.

Build morphology versions

If this option is checked, when you compile your tagged text, morphological versions will be produced in addition to text versions. The version IDs of the morphological versions will be the same as the version IDs of the text versions but with a "-M" appended.

Create Forms Database (FDB) File

If this option is checked, when you compile the version a forms database will be created and placed in the BibleWorks TAG/FDB folder. These files can be used when you pretag versions.

Is MS Project Version

If this option is checked the version will be compiled as an MS Project version. This means that in the Browse Window version selection menu, it will appear under the "Manuscripts" section. If this option is unchecked the location of the version will be determined as usual by language.

 

 

Menu Name

This is the name that will be used in BibleWorks menus for the compiled version of your tagged version.

 

The Pretag Window

 

The Pretag Window allows you to tag untagged Greek New Testament texts based on comparison with existing tagged texts and indexed lists of tagged words in BibleWorks. Note that tagging a text will delete all existing tags for the version being tagged. You can do this only once - at the beginning of your project.

 

To pretag the currently loaded text you select the appropriate options and click on the Tag button. The available options are these:

 

Manuscripts to tag

This list contains all of the versions in your tagging project. This will contain only one version unless you created multiple versions with the same project name (but different version names). Click on a version to select or unselect it. You can select one or all of the versions.

Form databases

BibleWorks has indexed lists of the major tagged Greek versions. With these databases BibleWorks can examine each word in your text and see how it is parsed elsewhere. This information is used only as a last resort. Select the form databases that you want to use. We suggest leaving all of then unselected until you see how the other options do. You can retag the version as many times as you want and see how the various options affect the results.

Base texts

This list box contains a list of existing installed Greek New Testament texts. BibleWorks will compare your text verse by verse and word by word with each one using pattern matching algorithms to find the verse that best matches yours. This information will be used to tag your text. For example, if you save the Robinson-Pierpont text and attempt to tag it with the pretagger, if you select just the BYZ option in this list, you would get back the original tagging in the BYZ version. Select the versions that you want to examine. To deselect a version click on it a second time. Note that not all of these versions have the exact same tagging scheme so you may not get exactly what you want. You might want to start with one version, the one that you expect to be closest to your text. Base texts are processed in the order in which they are listed. The order can be changed by holding the SHIFT key down and dragging the list item that you want to move.

Homonym lists

This list box contains a list of alias databases that can be used by the program when it is comparing your text to other versions. When doing a word by word comparison between two versions, many of the differences are not significant as far as morphological tagging is concerned. When the pretagging routines compare two words they check the alias list to see if the two words differ only in ways that do not affect morphology. Currently there is a Nomina Sacra list, a moveable nu list and a couple of other more general lists. These are normal text files and you can examine them in any text processor. They are .wrd files located in the ../tag/wrd folder in BibleWorks

Applied Tags

When BibleWorks tags a form in any other way than direct comparison with the first base text it adds a tag to the morphology to indicate that it is being tagged in a way that may need to be checked manually.  These tags are list on the right part of the Pretag Window. Some of the tags can be turned off and the spelling of some can be changed. See the section below entitled How Versions are Tagged.

 

+xb Base text tag

If you use more than one base text to tag your text, matches found in other base texts than the first will receive this tag. The sequential number of the base text will be added to the extra code. Words tagged by comparison with the first base text do not receive an extra tag.

+xh Homonym tag

When comparing the tagged text with various base texts, the selected homonym lists will be used to determine if two forms are the same. If a homonym list is used to affect a match, this tag will be added. The sequential number of the homonym list will also be added.

+xo Ordering tag

After a base text is used to tag the text, a second pass is made looking for mismatches that might have been caused by different word order. If any are found they are tagged according and this tag is added.

+xf Form database tag

After the base texts have been used to tag as many items a possible, the selected tag databases are consulted. Another pass is made over all base texts. Forms in the tag databases will be used to tag the untagged items only if (1) the current base text and current tagging text have the same number of words and (2) both texts have an unmatched word and (3) the form is unique to the tag database (i.e. is only parsed one way in the tag database).

+xi Tag isolated matches

If this check box is checked another pass will be made over the text being tagged. It will be compared again with the base texts and any tagged words with untagged words on either side will be tagged. These are forms that probably should be checked.

+xp Check preposition agreement

If this item is checked all prepositions will be checked and if their case does not agree with the first word with case following the preposition, a tag will be added.

+xa check article agreement

If this item is checked, the same process as outlined in the previous section will be performed for articles.

+xd Tag dittography

If this item is checked another pass will be made over the text looking for repeated words that are not repeated in the base texts.

+xz Use fuzzy matching with each base

If this item is checked, the word comparison process outlined in step 4 below (under how versions are tagged) will use relaxed comparison rules. Matched word must start with the same letter and have more than half of their letters in common. This is useful primarily when retagging a new version of the same text as the base text. When tagging a new text it is a little risky. This process is performed after each base text is compared.

+xz Use fuzzy matching after all bases

This is the same as the previous item except that it occurs in a second pass over all base texts.

 

Untaggable Words

 

+xu Untaggable
If the pretagger cannot tag a form a -@+xu entry will be made to the morphology database. This enables you to search for untagged words.

 

Tagging Rules

Require accent match in first pass

This option requires an exact match between the base text and tagged text for a match to occur. The default is for accents to be ignored in the first pass over base texts.

Also check unaccented form in form database

If this item is checked, when tag databases are consulted, a search will be made first with the accented form and then with the accents removed.

Permit forms database match with non-unique forms

Normally matches with the form databases will occur only if the form in question is parsed only one way in the form database. If this item is checked the most frequently occurring parsing will be used if a match occurs.

Convert codes to BW primary codes

If this option is checked, base versions that are coded with the alternate BW coding scheme will be converted on input to the primary coding scheme. See the section on  BibleWorks Coding Schemes. The process is not foolproof because there is not a one-to-one correlation between the two schemes. So all tags copied from base texts with alternate coding schemes should be checked. A summary of the conversion process is this:

 

Nouns: The primary scheme uses “common” and “proper noun” tags and the alternate does not. The program adds these by checking the GNT forms database. If a matching form with the proper noun tag is found, it will be used in the conversion.

 

Pronouns:

These conversions will be made:

ro >> rq

rc >> rr

rs >> as

This is imperfect at best and these will need to be checked.

 

Def Article: no change

 

Verb: The following voice tags will be changed:

d >> m

o >> p

n >> e

x >> a

q >> a

 

Adjective:

A ‘-‘ will be added to the second position.

 

Adverb:

Only the initial ‘b’ will be kept.

 

Conjunction:

A ‘-‘ will be added to mark the missing info.

 

Preposition:

The alternate codes do not have case. This is determined by looking up the preposition in the GNT forms database. If the preposition only occurs with one case in the database, it will be used. Otherwise the preposition will be tagged with the same case as the next word in the sentence with case. If the process fail a ‘-‘ will be used for the case.

 

Because of the uncertainties involved in the conversion, it makes better sense where possible to use versions tagged with the alternate coding scheme as only secondary, not primary base versions. For example, to tag the new Scrivener’s or W-H texts it would be better to use GNT as the primary base text and the old Scrivener’s or W-H as a secondary base text. This will result in a lot of corrections, but at least most of the codes will come from the GNT and not require conversion.

 

RAT file is in CCAT format

RAT files are normally in BibleWorks font format, but if you have produced your own RAT files in CCAT format you can check this option so the files can be converted on input. If you generated your project through the New Tagging Project window, you should leave this check box unchecked.

 

Clicking on this button will clear log pane on the lower left part of the Pretag Window.

 

Clicking on this button will tag the currently loaded tagging version using current settings. All current tags will be lost in the process.

 

How Versions are Tagged

 

When a text is pretagged the following process takes place for each verse:

 

1.     The verse is compared to the same verse in the first selected base text listed in the “Base text” list box. The order can be changed by holding the SHIFT key down and dragging the list item that you want to move. A longest common substring algorithm is used to identify forms common to the two verses. The tagging is copied for common words. No additional tags are added.

2.     If there are untagged forms remaining a check will be made for matches in the same verses irrespective of order. This will find words that are the same in the two verses but have a different order. These are tagged with the “ordering tag” (+xo” by default). After this stage is complete a summary file called report.txt will be saved to the project directory. It details all the mismatches after the first pass over the text comparing it to the base texts.

3.     An attempt is made to tag any words remaining by comparison with the remaining highlighted base texts, in the order specified. Any tags added by this procedure are modified by the addition of the “Base text tag” (+xb by default).

4.     The procedure followed in the first two steps uses the “homonym” lists selected in the “homonym lists” list box, in the order specified. Words found in these lists will be treated as identical to their “homonym” for tagging purposes. These lists can be hand edited. If a homonym is used to achieve a match the “homonym tag” (+xh by default) will be added.

5.     If there are remaining untagged forms, a search will be made for the forms in other versions if there are items checked in the Form databases list. These forms will be used to tag the untagged items only if (1) the current base text and current tagging text have the same number of words and (2) both texts have an unmatched word and (3) the form is unique to the tag database (i.e. is only parsed one way in the tag database).

6.     Four other checks are made depending on the check box settings in the last four options. If “Tag isolated matches” is checked, a search will be made for items that were matched but have unmatched words on either side. These will be tagged with the “tag isolated matches” tag (+xi by default).

7.     If the “check preposition agreement” check box is checked, a check on preposition case will be made. If the first word with case after a preposition has a case that is not the case of the preposition, the preposition will be tagged (with +xp by default).

8.     The “check article agreement” does the same thing for articles as for prepositions.

9.     The “tag dittography” tags double sequential forms in the manuscript that do not have doubles in the base text being used for comparison (+xd is used by default).

 

The Search and Replace Window

 

If you click on the button the Search and Replace window will open. This window allows you to perform search and replace operations on the loaded manuscript data. Changes are applied to all manuscript texts loaded in the top window.


 To search for items in the tagged texts, enter a form, lemma and code and click on the button. The entry in the main window will change accordingly.

 

Search Strings

 

To make global changes, enter a form, lemma and code to search for. Then enter a lemma, code and color to which the found text will be changed. You can use wildcards for any of the text items. If you check the "Replace substring" check box, the code entered will be treated as a substring search in each code. Otherwise only whole code matches will be matched. The substring search is useful for searching for, removing or highlighting special codes added to the morphology when texts are pretagged.

 

Replacement Strings

 

If the “Lemma” or “Code” check box is unchecked then the lemmas or codes will not be changed. So, for example, if you just want to change the color of certain items, uncheck both the lemma and code check boxes and perform the search using the “Search and Replace All” button. The default colors for the items in the color dropdown box can be changed in the main window File | Options window. Verses containing a match will be shown in the list on the left side of the Search and Replace window. You can click on a verse to load it into the main window for editing.

 

The “colors” menu has options to clear and set the color flags maintained for each verse. Note that colors are cumulative if the “colors are cumulative” check box is checked. For example, if you do a search to set a number of words to one color, you can do another search that sets the colors of only some of those words to a new color. Both colors are maintained for the words that are in both searches though you will only see one of them. You can clear a color by doing a search with the color dropdown set to “none”.

 

If the “hits are cumulative” check box is checked, each search will add verses to the list on the left. If it is not checked the list is cleared for each search.

 

Color settings for each verse are stored in *.HAG files. These are also backed up with the *.TAG files.

The color tagging information that results from Search and Replace operations is stored in a file with a .HAG extension so it can be retained between sessions.

 

The BWT Extraction Window 

When you select Utility | Extract TXT from BWT the BWT extraction window will open. This option prompts you for the location of a transcription file and extracts the transcription into text files. The extraction produces three files: a version text file (.RAT), a version text plus morphology file (.RAM) and a transcriber note file (.RAN). These files are in BibleWorks version format.

 

These files are the native files for the tagging tools. When you make tagging changes the results are stored in the .RAM file. This file is in BibleWorks version format and contains both the text and morphology data for the version being tagged.