38 An Overview of the Graphical Search Engine (GSE)

 

 

An Overview of the Graphical Search Engine

An Introduction to the GSE

The Elements of a GSE Search

Building a Query

GSE How To's

Query Processing Phases

An Example

Subqueries

Varieties of Word Boxes

Normal Word Boxes

Range Filter Word Boxes

Agreement Condition Word Boxes

The Invert Results/NOT Operator

 


An Overview of the Graphical Search Engine (GSE)

 

For most applications the BibleWorks Command Line interface provides all the search capabilities you will need. It does basic Boolean searches with ease and speed. But there are practical limits to how complex the searches can get when entered on a Command Line. The need was felt for a more powerful search engine that would permit the user to construct very complex searches using a graphical user interface, and so the Graphical Search Engine (GSE) was created.

 

The GSE does not currently support Chinese, Arabic, or other multi-byte languages. English, Greek, Hebrew, and other western languages are supported. Chinese, Arabic, and other multi-byte languages are supported for searches on the Command Line.


An Introduction to the GSE

 

Using the Graphical Search Engine (GSE) is quite simple. You construct word boxes that you can click and drag around the screen to arrange them in a logical order. You then draw lines to connect these word boxes to merge boxes which correspond roughly to the familiar AND, OR and NOT operators. To specify ordering and proximity you simply draw lines between the word boxes. You specify agreement by connecting the word boxes with agreement boxes. There are very few limitations on the complexity of the constructions you can put together with this interface. You can save queries for later use, e-mail them to other people, and even plug the results back into new queries. The GSE also supports punctuation-delimited searches, case sensitivity, and multiple version searches.

Users should keep in mind that the GSE is designed for complex queries. It would not make sense to use it to look up single words or phrases unless you need certain capabilities (like case sensitivity) that the Command Line search engine does not support. For that reason we recommend you familiarize yourself with Command Line operations before digging into the complexities of the GSE. This has the added advantage that any search you can type on the Command Line can be transferred and properly formatted for the Graphical Search Engine. The best way to begin learning about the GSE is to do some familiar Command Line searches and transfer them to the GSE interface (simply by opening the GSE with the search still on the Search Window Command Line).

 

Consider for example a simple phrase search for the phrase "in the beginning" in the NAS version. The search on the Command Line would look this:


If you enter this search on the Command line and open the GSE you will see something like this. BibleWorks will reformat the command and copy it to the GSE Window.

 

For a case like this it does not make sense to use the GSE. But when the searches start to get more complex the Command Line quickly runs out of steam. The GSE on the other hand has very few significant limits on the complexity of the searches that you can perform.

 

The Elements of a GSE Search

 

The following section describes the different components of a GSE query. It is an overview and is intended only to explain basic concepts of the GSE. There are four main elements in a GSE query: word boxes, merge boxes, agreement boxes, and ordering boxes. These boxes are combined to build queries that would be otherwise impossible to express on a single, linear Command Line.

§         Word Boxes  

The word box represents a word, a set of words, a wildcard, a set of wildcards, or a list of references. It is equivalent to the individual words that you type on the Command Line but can represent a much wider class of objects. For example, a single word box could represent one of the following:

A particular word like "love"

Any of a group of words matching a wildcard specification like "lov*"

Any word in one list of words but not in another list

Any word in a specific range or list of verses

Any arbitrary list of words, such as a list of synonyms

 

A GSE query can contain many word boxes. The relationships between the word boxes are defined by different connections which the user makes between word boxes by inserting various objects and connecting them to word boxes with lines.

There are three basic kinds of connections that can be made between Word Objects:

§         Merge Boxes  

One element to which word boxes connect are the merge boxes. A merge box represents a Boolean operation (such as AND or OR). When word boxes are connected to a merge box, the verses represented by the individual word boxes are combined using the operator specified in the merge box. If the merge box is set to AND, then the results of the underlying word boxes will be AND-ed together. If the merge box is set to OR, then the results of the underlying word boxes will be OR-ed together. In addition, multiple merge boxes may be combined and connected to each other, specifying very complex Boolean conditions. For example you could AND several word boxes together, then OR several other word boxes together, and then AND the results of both operations together to form a sort of search tree.

 

§         Agreement Boxes

Word boxes may also be connected to agreement boxes. A given agreement box can be set to require agreement in gender, case, number, etc. All word boxes connected to an agreement box will be required to match in the selected agreement conditions. Multiple agreement boxes, each with different agreement conditions, may be used, and a given word box can connect to multiple agreement boxes. This allows very complex agreement conditions to be specified. For example, you could specify that a given word box must agree in gender, case, and number with two other word boxes, but that it must also agree in lemma and part of speech with some other set of word boxes.

 

§         Ordering Boxes    

 Finally, word boxes may be connected to each other using ordering boxes. When you draw an ordering connection between two word boxes, an ordering box is automatically inserted between the two word boxes. Ordering boxes are used to specify ordering requirements between two boxes. You can specify that a given word must be immediately before or after another word, and you can also specify the space that must occur between two words. Word boxes can have multiple ordering boxes connected to multiple word boxes, so complex phrases with multiple endings can be constructed using ordering boxes. For example, you could build a query that finds all passages where "[noun1] [verb1] ...[article1] [noun2] [verb2]" or "[noun1] [verb1] ...[verb2] [noun2]" occur (along with arbitrary agreement conditions set between the various word box).

 


Building a Query

 

If you have a search for the GSE, but just can't seem to figure out how to build a query to express it, walk through the following steps to build the query:

 

§         Step 1: Build your word boxes and merge boxes
Figure out the various words, lemmas, or morphologies that should appear in the results. Build a word box for each one and connect them appropriately to merge boxes. At this point, you can run the query and see the initial list of verses that the GSE will use for the verse test phase.

§         Step 2: Add range filter word boxes or agreement condition word boxes
If your query involves range filter word boxes (e.g. All words before/after another word must meet a certain property) or agreement condition word boxes (e.g. If a certain word pattern occurs before/after another word, then apply a certain agreement condition), this is the time to add them to the query window. Don't forget that these types of word boxes aren't connected to merge boxes.

§         Step 3: Add ordering connections
Next, draw ordering connections between all words that have order relationships. This is also the step where you can set up punctuation filters and specify the distance between words. At this point, you can run the query and see the list of matching verses -- the list should contain all matching verses (without agreement tests).

§         Step 4: Add agreement boxes
In this step you add agreement boxes to the query and connect word boxes to the agreement boxes.

§         Step 5: Set options
Finally, if you want to cross verse boundaries, specify search limits, search qere/kethib, etc. set these options in the | Query | Properties |  window.

 

Sometimes you will build a query and get results that don't seem right. Make sure you understand the query processing phases outlined in the previous section (see Query Processing Phases). If you don't understand the order in which the GSE performs each test, your queries will not run the way you expect them to.

 

If this doesn't help, incrementally re-build the query, one box at a time, testing each step by running the query and checking the results. First build the query using only the merge boxes and word boxes. If the query output looks right, add the agreement condition word boxes and range filter word boxes and ordering connections, one at a time, running the query after each addition. If the query output looks right, add the next box or condition.

 


GSE How To's

 

§         How to open a GSE window using the keyboard or menus:
You can open a GSE in three ways: From the Tools button below the Command Line, from the main menu Search | Graphical Search Engine, using the F9 function key, or from the GSE Button on the Main Window Button Bar.

 

§         How to select, move, connect, and delete multiple objects:
You can select more than one box in the GSE window at a time. In selection mode, drag a rectangle around all boxes to select. Boxes can be added or removed from a selection by holding down Ctrl and clicking on them. When multiple objects are selected, you can move then by clicking and dragging any of the selected boxes. Likewise, selecting | Edit | Delete |  from the menu will delete all selected boxes.

When multiple word boxes are selected, you can switch to connect mode or ordering mode and drag connections from all of the selected boxes to another box. However, if you need to specify a primary word box for an agreement condition, you should connect the word boxes to the agreement box individually.

 

§         How to disconnect boxes:
When you want to disconnect two or more boxes, simply select the boxes to be disconnected (must be at least two) and choose | Edit | Disconnect |  from the menu. To select multiple boxes hold down the <Ctrl> key while you select them.

 

§         How to build queries more quickly:
The easiest way to build a GSE query is to start on the Command Line and export a query to a GSE window. That way, most of your word boxes and initial connections to merge boxes are already set up. From there you can manually finish the construction.

 

§         How to search on punctuation:
If you want to specify that a certain set of punctuation marks must appear or must not appear between two words, follow these steps:
In the Query Properties window, find the language group for the version you are using in the search. Type the punctuation marks that you want to use into the window for that language group.

Between all words where you want to require or eliminate punctuation, connect them with an ordering connection and check Require or None.

A Useful Tip: To search for an ordered string of words terminated by a period, add a "*" word box to the end of the query, set the punctuation group to be '.' only, turn on the punctuation flag in the ordering box before the "*", and turn on "Cross Verse Boundaries."

 

§         How to tell which object a window is connected to:
If you open the windows for several merge boxes or word boxes, you may forget which window belongs to which box. To find the box to which a window belongs, click on the window and the box will be highlighted.

 

§         How to decide between verse proximity and word proximity:
Verse proximity is best used in queries without ordering and agreement (Boolean operations only). Word proximity is better used in queries involving ordering or agreement.

 

§         How to use the status bar toggles:
The status bar at the bottom of a GSE window displays the state of several query options. The options can be toggled simply by double-clicking on the option in the status bar.

 

§         How to do topical searches using Louw-Nida Domain lists
If you have a Greek Morphological database (GNM, BNM, BGM, etc.) as your version for a word box, when you select Inclusion/exclusion list and select More>> | Add Louw-Nida Domain in the Word Box window, a window will open that allows you to easily add Louw-Nida Domain Word Lists to the GSE inclusion list. Don't miss the power that this gives you. In effect it allows you to search on domains the same way you search on words. The window has two list boxes. The one on the left is a display of the Louw-Nida domains and sub-domains. When you click on one of the domains, the corresponding lemmas will appear in the right-hand list box. You can select one or more (or all) of these words by clicking on them. As you change the domains you look at you can select different words in each domain and a record will be kept of what you have selected.

If you want to find what domains include a particular Greek word just type it in the "Show domains with this string" box and click on "Apply filter." You can even use wildcards. This will generate a more abbreviated domain list. If the "Accents" check box is activated, accents will be significant in the domain string that you supply, otherwise not.

There are buttons on the right to select all the displayed lemmas or to clear the words in the currently selected domain.

When you are ready to copy the list to the GSE inclusion list, just click on OK. Duplicates will be removed before copying, and Louw-Nida entries that are phrases rather than words, will not be copied - the inclusion list cannot currently handle phrases. Phrases will be grayed out in the "Words to export" List Box to remind you that they cannot be copied. We kept them there just to remind you of the fact that you may be losing some of the domain content (though not a significant amount).

 


Query Processing Phases

 

In this section we will discuss what happens behind the scenes when a query is run. We will also discuss each of the screen objects in greater detail. With this understanding, you will have the information you need to build complex queries.

 

Query Processing Phases

 

When processing a query, the Graphical Search Engine passes through several phases. If you understand the different phases and the strict order in which they occur, you can better understand how to build a complex query and how to interpret the results of a query. For complicated queries, it is important to know the order of steps the GSE follows, otherwise you will not know how to build queries to find the answers you want.

 

§         In the first phase, called the Verse List phase, the search engine constructs a verse list for each word box. These lists are based on the word, wildcards, or inclusion/exclusion lists specified in the word box. For each word box, every verse which contains a word matching the word box specification is collected into the verse list. Of course, if the word box represents a verse list from disk, no work is done for the word box. For word boxes specifying inclusion/exclusion lists, a verse is included in the verse list if some word in the verse matches the inclusion/exclusion strings.

 

§         The second phase uses the merge boxes to combine the verse lists of the word boxes from the first phase. This is the Boolean Operation phase. During this phase, all word boxes with connection arrows into the merge box have their verse lists converted to the Bible version specified in the merge box. After the verse lists are converted, the AND, OR, and NOT operations specified in the merge box are done on the input verse lists. Verse proximity specifications are also tested here. At the end of this phase, the search engine has generated a single list of verse references per merge box. If a merge box is an input for another merge box, its results (a verse list) are fed into the connected merge box and processed in the same way that a word box verse list is processed. At the very end of the entire phase, a single verse list is produced. This verse list contains all verses that can possibly contain a "hit".  The verse list represents the results that you would get if you removed all ordering, agreement, range filters, and agreement conditions from the query.

 

§         The third phase, the Verse Test phase, walks through each verse in the verse list produced at the end of the second phase and examines the text of each verse. For each verse, the following tests and actions are performed:

 

1.       Match List Construction
Which combinations of the words specified in the word and merge boxes are in this verse? A given query may have more than one possible combination of word boxes that may satisfy the query. For example, if a query specifies "(word A OR word B) AND word C", then "word A" and "word C" is a possible combination of word boxes that will satisfy the Boolean conditions. Likewise, in this example, "word B" and "word C" is another possible combination of word boxes that will satisfy the Boolean conditions. In this phase, all possible combinations of word boxes that will satisfy the Boolean conditions are tested. A match list is a single combination of word boxes that will satisfy the merge box Boolean conditions. All of the match lists for the query are collected into a list of match lists. The GSE must next see which, if any, of the match lists can be mapped to the text of the current verse. If each of the word boxes in a given match list can be mapped to a word in the text of the current verse, the match list is a possible hit.

For example, if a query specified "Find all words A or B or C, where A or B or C occur before D", the possible match lists are "A and D" or "B and D" or "C and D". A given verse may only contain "B and D" and "C and D", so only "B and D" and "C and D" would be put into the list of match lists for this given verse. When the next verse is examined, however, it may be that only "B and D" occurs, so for that verse, "B and D" would be the only match list in the list of match lists. Note that match lists are only collections of words and wildcards -- nothing about ordering is included. At this point, however, any absolute word position tests are also done (e.g. a word box specifies that a word must occur at the beginning or at the end of the verse).

2.       Ordering Test
In this verse, are the ordering conditions satisfied by one of the match lists? For example, if the list of match lists for a given verse only contains "B and D", and an ordering box specifies that D must occur with exactly three words intervening before B, and the text of the verse is "A A D B A A D", the verse would fail this test. If the text of the verse contained "A A A D A A A B", the verse would pass this test.

3.       Agreement Test
In this verse, are the agreement conditions satisfied by one of the match lists?

4.       Punctuation Test
In this verse, are the punctuation conditions satisfied by one of the match lists?

 

An Example

 

Let's use the following Granville Sharp query to illustrate the different phases (see Example 10: The Granville Sharp Rule) :

In the first phase, the GSE builds four reference lists, one for each of the following word boxes: the "*@d*" word box (the one before the first noun), the two "*@+/-v{pr}..." word boxes, and the "kai" word box. The verse list for the "*@d*" word box contains all verses where a word matching "*@d*" occurs. The "*@+/-v{pr}..." word box contains multiple specifications.  It specifies nouns and certain participles, so the verse lists for these word boxes contain all verses where a word matching "*@n*" or "*@v{pr}..." occurs.  The verse list for the "kai" word box contains all verses that have the word "kai" in them.

 

In the second phase (the Boolean Operations phase), these four reference lists are AND‑ed together, producing a single reference list.

 

In the third phase, the verse test phase, the GSE looks at each verse and performs the following tests on each verse to decide whether to keep the verse or eliminate it. The Match List Construction is trivial for this query since there is only one possible match list ("*@d*, *@n*, kai, and *@n*").

 

§         Ordering Test: In this verse, do these words appear in an "article-noun- kai -noun" ordering (zero words between the first "*@d*" and first "*@n*", with at most two words between all other words)?

 

§         If so, do the three words agree in gender, case, and number? Also, if any articles appear between the kai and the second noun, do they all disagree in case with the second noun?

Subqueries

 

Understanding query processing phases prepares you to use subqueries, with which you can run multiple queries in a single GSE window and combine query results in another query (see GSE Examples, Example 15).  Subqueries are useful for comparing phrases in different Bible versions. They can also be used to eliminate phrases in a query.

 

Each subquery is processed as a single query, using the query processing phases above. Subqueries are run one at a time and can be nested in subqueries. Subqueries lower in the tree run first. When an individual subquery is finished, the result is simply a list of verses. So the entire subquery can then be thought of as nothing more than a single word box using a reference list from disk. This verse list is passed up to the merge box above the subquery, and is processed during the parent query's Boolean operations phase.

 

This feature is especially useful if you want to run a query that checks ordering or agreement in more than one Bible version. For instance, to find all verses where the BGM has "o anqrwpoj" and where the NAS has "the man" and where the NIV has "the man", you would build a query composed of three subqueries: One subquery to find "o anqrwpoj" in the BGM, one subquery to find "the man" in the NAS, and one subquery to find "the man" in the NIV. The merge box for each subquery would have "Make subquery" checked. The three merge boxed would be joined with outgoing links to a fourth AND merge box. A sample query is included, entitled "subq1.qf."

 

Varieties of Word Boxes

 

A word box can represent a word/wildcard, a reference list saved on disk, or a set of words and wildcards to include or exclude. In addition, there are three different "flavors" of word boxes: normal word boxes, range filter word boxes, and agreement condition word boxes. In this section we will discuss the different types of word boxes.

 

§         Normal Word Boxes

The three different kinds of word boxes can be best understood if we limit the discussion to word/wildcard boxes (boxes that represent a single word or wildcard such as "Lord" or "faith*"). In these cases the normal word box (specified by checking the "Normal" option in the word box window) is used in the first and third query processing phases. In the first phase (the verse list phase), the word or wildcard specified in the word box is used to find all verses containing an occurrence of the word or wildcard. For instance, if the word box specified "faith*", all verses containing a word starting with "faith" would be put into the verse list in this phase. In the third phase, the search engine first determines if at least one word matching the word or wildcard occurs in the text of the given verse. If it does, the match lists containing the word box are candidates for producing a hit.

 

§         Range Filter Word Boxes

Range filter word boxes are only used in the third query processing phase. They are ignored in the first and second phase. Range filter word boxes are not even connected to merge boxes. This type of word box is used when we want to require that a range of words before, after, or between two normal word boxes must match a certain word or wildcard. It is also used to specify that a range of words must meet one or more agreement specifications. The word or wildcards in a range filter word box are only examined in the third query processing phase. For example, if we wanted to find all verses containing word A and word B where all words between A and B must be accusative, we would use a range filter word box between A and B to specify the accusative condition. Another example requiring a range filter is a query where word A occurs somewhere before word B and no nouns may occur between A and B. A range filter word box between A and B would be used to enforce the condition that no nouns occur between the two words. Finally, if you wanted to specify that all words between A and B must agree with word B in gender, case, and number, a range filter word box must be used between the word box for A and the word box for B. An example is given in Example 6 (see the GSE Examples section).

 

There are two types of range filter word boxes. The distinction between these two types is subtle, but important. The easiest way to explain the difference between the two types is to give an example. In this example, we have a normal word box describing, say, a noun. Preceding the word box, we have a range filter word box that specifies that the two words preceding the noun must not be an article. Now, this query may be interpreted in two ways. The first interpretation says, "Find all nouns and ensure that there are no articles in the two preceding words." The second interpretation says, "Find all nouns where there are two other words preceding the noun. Furthermore, the two words preceding the noun must not be articles." In the first interpretation, the two words represented by the range filter word box may or may not exist, but they must not be articles. For instance, a noun appearing at the beginning of the verse will not have any words preceding it at all, if the search is limited to single verses (when the "cross verse boundaries" option is off), so such nouns will always be considered hits. In the second interpretation, the two words represented by the range filter word box must exist (and they must not be articles). If a noun occurs at the very beginning of a verse, and the "cross verse boundaries" option is off, then this noun is not a hit, because it does not have two words preceding it.

 

Both of these interpretations illustrate a flexibility that the user needs to have. In order to tell the GSE how to interpret your query, you use one of the two types of range filter word boxes. These two types allow you to specify how to treat queries that have hits right at the beginning or end of a verse. When a query specifies that a search does not cross verse boundaries, the GSE will restrict the scope of its search to a single verse at a time. When a query uses a range filter word box to describe words that must appear at the very beginning or at the very end of a phrase and you want to insist that the words described by the range filter word box must exist, you use the range filter option labeled "Range filter (all specified words must match and must exist within the verse bounds setting)". When a query uses a range filter word box to describe a specification that you want to apply to words if they exist within the verse boundaries (depending on the setting of the "cross verse boundaries" option), then you should use the range filter options labeled "Range filter (all specified words must match IF they are within the verse bounds setting)."

 

It should be noted that the GSE does not cross Psalm chapters and ends of books, even if the "cross verse boundaries" option is on. In other words, a "hit" will not span two books or Psalm chapters.

 

§         Agreement Condition Word Boxes

An agreement condition word box is used when you want to specify that a connected agreement box is to be used only under certain conditions. Without an agreement condition box, an agreement box's condition will always be enforced. An agreement condition word box does not require the word or wildcard in the word box to exist -- it only specifies that if a word in the verse matching the word or wildcard condition in the word box exists in the correct ordering position, then the attached agreement boxes must be enforced. For instance, if you wanted to build a query that specified that word A must agree in gender, case, and number with the word immediately preceding it only if the word immediately preceding A is an article, you would have to use an agreement condition word box to represent the article. Another example requiring an agreement word box is a query where word A precedes word B and we want to require that if a word C or D occurs between A and B, then C or D must agree with A and B in person, but we do not want to require that any and all words between A and B have to have any agreement with A or B. We would use an agreement condition word box to represent C and D. Note that the agreement condition word box for C and D does not require words C or D to occur. It merely specifies that if C or D occurs, the connected agreement box must be satisfied. Agreement condition word boxes are only tested in the third query processing phase and are ignored during the first and second phase. Like range filter word boxes, agreement condition word boxes are not connected to merge boxes.

 

§         The Invert Results/NOT Operator

When you want to search for verses that do not contain a particular word, or when you want to search for phrases, but exclude words from the phrase, you will need to use the Invert results (NOT) option in the word box. This flag is easy to understand, but the details of how it works contain subtleties that we will explain in this section. When checked, this flag works in two different ways.

 

Generally there are two reasons for wanting to "NOT" a word box.  Here's an example of each of the two types of queries:

 

1.       Find all verses containing the word "Lord" but NOT the phrase "Lord God".

2.       Find all verses containing the word "Lord" but NOT the word "God".

In each query you would want to NOT the word box containing "God", but notice the differences. In the first query a verse can contain the word "God" as long as "God" does not immediately follow "Lord".  In the second query any verse containing the word "God" must be eliminated from the verse list.

 

Now let’s look at how the NOT option is processed during the different query processing phases.

 

1.       During the Boolean Operation phase of the query processing, all verses containing words that match the specification in the invert/NOT word box are placed in the verse list for that word box. The verse list compiled at this invert/NOT word box is then inverted (all verses not in the list are included, and all verses in the list are excluded). The result is that for this invert/NOT word box, the GSE constructs a list of all verses that do not contain any occurrence of a word matching the specifications in the invert/NOT word box. There is, however, an exception to this: If the invert/NOT word box has ordering or agreement links, then the invert/NOT word box is completely skipped during the Boolean Operation phase. The reason for this exception is that an invert/NOT word box that appears in a phrase (ordering) or in agreement conditions should not eliminate a given verse if a word matching the specification in the invert/NOT word box appears somewhere else in the verse. There may be a combination of words in the given verse that satisfy the invert/NOT condition and the ordering or agreement conditions connected to the word box, in spite of the existence of other words in the verse that match the invert/NOT word box's specification.

This is somewhat complicated to describe, but the following example should help clarify the point. Say for example we have a query that searches for all verses containing "Lord" but not the phrase "Lord God".  Hit verses can contain the word "God" as long as "God" does not immediately follow "Lord".  The GSE query (see above) would contain two word boxes: one for "Lord" and one describing NOT "God". These two word boxes would be connected by an ordering link, specifying that the "Lord" word box must precede the NOT "God" word box. Thus, the NOT "God" word box will be ignored during the Boolean Operation phase, allowing verses that contain the word "Lord" to be examined. This is necessary since a verse that contains "God" should not be eliminated as long as "God" does not immediately follow "Lord".  If the GSE were to eliminate all verses that contained the word "God", verses with possible hits would be unnecessarily eliminated (such as "And Abraham...called there on the name of the LORD, the everlasting God.").

2.       During the third phase of query processing (the Verse Test phase), a match is made with a specific word in the text if the word box specification does not match the word in the text. In the case of inclusion/exclusion lists, the roles of the inclusion list and exclusion list are swapped. Note that in this phase, we are searching through the specific words in a verse. The goal during this phase is to map the word boxes to specific words in the text of a verse.