Page 1 of 2 12 LastLast
Results 1 to 10 of 11

Thread: Why this difference in stats?

  1. #1
    Join Date
    Aug 2004
    Posts
    208

    Default Why this difference in stats?

    Hi all,
    Can someone explain this to me. When I search in NT for:
    - euaggelion, I get 76 hits.
    - euagglizo, I get 54 hits = 130 hits.

    But when I search for /euaggelion euaggelizo, I get 136 hits. Is an OR-search not a simple way of adding the hits of two words? Why then 136 when only 130 in indvidual searches?

    Thanks,
    Morten

  2. #2
    Join Date
    Jan 2008
    Posts
    214

    Default

    Hi Morton,

    I tried your searches, using BNM, and got the same count for the separate searches, but for the OR search I got 120 verses, 34 forms, 130 hits -- matching the total of the separate searches.

    I don't understand where your 136 hits came from. Using BGM (LXX + BNT) I got 154 hits; in both BYM and SCM I got 132 hits.

    Which morphological version were you using?

    --Jim

  3. #3
    Join Date
    Aug 2004
    Posts
    208

    Default

    This is truly strange:
    - BGM, limits NT, /euaggelion euaggelizo = 120 verses, 34 forms, 136 hits.
    - BNM = 120 verses, 34 forms, 136 hits.

    Name:  bgm.PNG
Views: 87
Size:  57.0 KBName:  BNM.PNG
Views: 87
Size:  53.7 KB

    I also tried to restart, update and so on.
    - euaggelion = 76
    - euagglizo = 54 = 130 combined.
    - but /euaggelion euaggelizo = 136.

    What can explain this?

    Morten

  4. #4
    Join Date
    Jan 2008
    Posts
    214

    Default

    Quote Originally Posted by MortenJensen View Post
    This is truly strange:
    - BGM, limits NT, /euaggelion euaggelizo = 120 verses, 34 forms, 136 hits.
    - BNM = 120 verses, 34 forms, 136 hits.

    I also tried to restart, update and so on.
    - euaggelion = 76
    - euagglizo = 54 = 130 combined.
    - but /euaggelion euaggelizo = 136.

    What can explain this?

    Morten
    There is an obscure option in BW:
    Tools
    Options
    General
    Flags
    Command Line Search Options
    Compute Search Window hits by permutation.

    If this option is checked, the hit count is 136
    if this option is not checked, the hit count is 130

    This option only affects hit counts when searching for more than one word.
    I recommend keeping it turned off -- there may be cases where it is useful, but I find it confusing.

    BW help says:
    Compute Search Window hits by permutation
    If this option is set, Command Line search hits will be computed by permutation. Statistics for hits will include the various combinations of words possible with the search criteria. If this is turned off, the number of hits will be the same as the number of words highlighted.


    --Jim

  5. #5
    Join Date
    Aug 2004
    Posts
    208

    Default

    Thanks Jim, amazing.

    If I understand this correctly, BW permutes the verses, where both words occur? There are 5 verses with both - and then it count those twice? Should only give 135, though.

    I for one find this option really confusing. At least it should not be tagged as standard. If this is the case (and not me tagging it somewhere down the road), MANY BW-users have ended up with wrong stats

    Morten

  6. #6
    Join Date
    Jan 2008
    Posts
    214

    Default

    I have no idea what value for this option BW ships with.

    I just revisited another thread that touched on the issue of permutation counts, as well as smart apostrophes: http://www.bibleworks.com/forums/sho...Spicer-video-2

    From that it appears that having permutations on increases the count for OR queries, and decreases the count for AND queries. I can see some cases where it might be useful for AND queries, but cannot see the usefulness for OR. And I agree that permutations off should be the default; only in very special situations is it useful.

    Also, it would be nice if there were some indication displayed when this option is in use -- maybe a note with the statistics in the status bar

    --Jim

  7. #7
    Join Date
    Aug 2004
    Posts
    208

    Default

    Quote Originally Posted by Jim Wert View Post
    I have no idea what value for this option BW ships with.

    I just revisited another thread that touched on the issue of permutation counts, as well as smart apostrophes: http://www.bibleworks.com/forums/sho...Spicer-video-2

    From that it appears that having permutations on increases the count for OR queries, and decreases the count for AND queries. I can see some cases where it might be useful for AND queries, but cannot see the usefulness for OR. And I agree that permutations off should be the default; only in very special situations is it useful.

    Also, it would be nice if there were some indication displayed when this option is in use -- maybe a note with the statistics in the status bar

    --Jim


    This is in my mind more than a minor issue.i wonder how many users, students, scholars alike that have ended up relaying on faulty stats for this reason.


    Friends at bw, please consider changing this practice.


    Morten

  8. #8
    Join Date
    Feb 2015
    Posts
    1

    Default

    Quote Originally Posted by MortenJensen View Post
    Thanks Jim, amazing.

    If I understand this correctly, BW permutes the verses, where both words occur? There are 5 verses with both - and then it count those twice? Should only give 135, though.

    I for one find this option really confusing. At least it should not be tagged as standard. If this is the case (and not me tagging it somewhere down the road), MANY BW-users have ended up with wrong stats

    Morten
    A little more on this, in case someone comes looking later and doesn't understand the permutations thing. It was not obvious to me what was going on with it (and still isn't obvious!).

    The difference between 130 and 136 in the OR search is this:

    The 130 "un-permuted hit count" is supposed to count the number of highlighted words, and it does. Verify by taking the number of verses (120) and adding 1 for each *2 and 2 for each *3 verse and you get 130. In other words, treat all un-starred verses in the search results window as having a *1 by them, and then summing up all the *1, *2, *3, etc.

    The 136 "permuted hit count" is confusing. But you arrive at it the same way: taking 120 and adding 1 for each *2, 2 for each *3, 3 for each *4, etc. and you get 136. Or in other words, treat all unmarked verses as having a *1 by them, and then summing up all the *1, *2, *3, etc.

    The same works for the AND search, permuted or not permuted. Just add up the displayed hits, *1, *2, etc. and you get the number of hits.

    Now what does this MEAN? The question is, how does BW compute the *-number like *2, *3, and *5 next to each verse? Here's where I'm doing some reverse engineering and guesswork. In the permuted-OR search, there are 120 verses, 9 marked with a star. In the un-permuted-OR search, there are 120 verses, also 9 marked with a star, but sometimes with a different star value.

    The verses with a single matching word in them are straightforward. Count them as a *1 in both permuted and unpermuted cases. There are 111 of them in the OR search. Watch for the number 111 in the last row of the table below. The AND search does not have these 111 verses, nor the 4 verses with just nn or vv. The verses with two or more words are complicated. The 9 verses that have two or more matching words are shown in the following table shows, with v = verb form euaggelizw, and n = noun form euaggelion.

    Word Pattern Permuted OR-search Un-permuted OR-search Permuted AND-search Un-permuted AND search
    vn1n2 (1 Cor. 9:18) *5 = following matches can occur: v, n1, n2, vn1, and vn2 (if n1 didn't exist, vn2 would match the OR search pattern) *3 = number of matching words found in the verse, no permutations *2 = matches vn1 and vn2 (n1 doesn't need to exist for there to be a match of the AND pattern) *3 = number of matching words found in the verse, no permutations
    nv (1 Cor. 15:1) *3 = following matches can occur: n, v, nv *2 = number of matching words found in the verse, no permutations *1 = only one permutation matches *2 = similar to above
    nv (2 Cor. 11:7) Ditto Ditto Ditto Ditto
    nv (Gal 1:11) Ditto Ditto Ditto Ditto
    nv (Rev. 14:6) Ditto Ditto Ditto Ditto
    nn (1 Cor. 9:14) *2 = following matches can occur: n1, n2 Ditto N/A - nn doesn't match AND search N/A - nn doesn't match AND search
    vv (1 Cor. 9:16) *2 = following matches can occur: v1, v2 Ditto N/A - vv doesn't match AND search N/A - vv doesn't match AND search
    vv (Gal 1:8) Ditto Ditto N/A - vv doesn't match AND search N/A - vv doesn't match AND search
    nn (Phi 1:27) Ditto above Ditto N/A - nn doesn't match AND search N/A - nn doesn't match AND search
    Total =111+25 = 136 hits 111+19 = 130 hits 6 hits 11 hits

    I don't see any usefulness of the permuted search for Bible study. It may simply be that it was easy to generate this information as an artifact of the search algorithm BW uses. I personally don't see how that would be easy, but I then again don't know their internal data structure and algorithm for doing the search. Anyway, the verse-level permutation computation is sort of arbitrary anyway, since we have to go up a level to the paragraph to get at the full contextual meaning.

    Maybe someone can verify that what I've figured out above actually makes sense!

    Here is some scratch work for those interested to "dig in" some more:

    All searches in BNM, limits set to nt, and to start, Compute Search Window hits by permutation is checked.

    /euaggelion ==> 73 verses, 4 forms, 76 hits
    /euaggelizw ==> 52 verses, 30 forms, 54 hits
    Sum = 125 verses, 34 forms, 130 hits
    /euaggelion euaggelizw ==> 120 verses, 34 forms, 136 hits

    The number of verses in the sum of the two searches (125) versus in the or-search (120), suggests that 5 verses have both words, and indeed that is the case.

    .euaggelion euaggelizw ==> 5 verses, 34 forms, 6 hits

    It is curious that the number of forms in this search is 34. The search result actually shows 6 unique forms of the word in the final search result. It appears that "forms" is a summation of all the number of forms from the two equivalent individual searches but that this number is not "pared down" when the two individual searches are combined by finding verses that occur in both the search for the first word and the search for the second word.

    But back to the point. The 5 verses that have some form of the noun and some form of the verb are:

    1 Co. 9:18 - 1 verb form and 2 noun forms
    1 Co. 15:1 - 1 noun and 1 verb
    2 Co. 11:7 - 1 noun and 1 verb
    Gal. 1:11 - 1 noun and 1 verb
    Rev. 14:6 - 1 noun and 1 verb

    There are another 4 verses that have two or more of the noun form or verb form, but not both the noun and the verb:

    1 Cor. 9:14 - 2 nouns
    1 Cor. 9:16 - 2 verbs
    Galatians 1:8 - 2 verbs
    Philippians 1:27 - 2 nouns

    With compute Search Window hits by permutation is UNchecked:
    /euaggelion euaggelizw ==> 120 verses, 34 forms, 130 hits (hits decreased = # of highlighted search words)
    .euaggelion euaggelizw ==> 5 verses, 34 forms, 11 hits (hits increased = # of highlighted search words)

  9. #9

    Default

    @postiffm: Thanks! That actually makes a little better sense of the permutation deal. I agree that's it's probably best to use unpermutated searches.
    Mark G. Vitalis Hoffman
    Professor of Biblical Studies
    Lutheran Theological Seminary at Gettysburg
    ltsg.edu - CrossMarks.com
    Biblical Studies and Technological Tools

  10. #10
    Join Date
    Mar 2004
    Posts
    841

    Default Statistics on OR searches

    Hi All,

    This seems to be a somewhat heated topic. But let me at least give you the rationale for the way we have done it. Well, to be blunt, it is the proper way to report search statistics. A search is a way of specifying a pattern. The results should report how many times the pattern occurs. And that is exactly what our default statistics do. This may not make sense to some people for a simple two word OR search but as searches get more complex, like a multilevel search with AND branches and OR branches, it becomes the most informative way of reporting the results. If you do a complex search and turn off permutation calculations, the hit results are very difficult to understand. But with permutations on you can assign a definite meaning to the hit results. I don't think anyone familiar with common methods of the statistical analysis of data and of pattern searching would be at all surprised. Maybe we haven't done the best job of explaining the matter, but I do believe that what we have done is the proper method. Think about complex grammatical searches. You specify the pattern via a search specification. It is useful to know how many ways that pattern can be matched in a verse because it tells you how many cases you need to examine. There may be multiple ways to see the pattern in a single verse. If all someone is interested in is single word statistics then that is easily available to them in BibleWorks. But it is not the only way of thinking about search statistics.

    To see what is going on think about a simple AND search, say "in Christ". With permutations turned on if a verse has one occurrence of the word "Christ" and two of "in" you will get two hits. With permutations turned off you get one. Now which is more informative? With permutations on you learn that there are two ways that the combination of "in" and "Christ" can be realized. This may not be terribly helpful in this case, but in a highly inflected language like Greek it can be very informative.

    I hope this helps.

    Mike
    Last edited by MBushell; Yesterday at 08:53 PM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •