Results 1 to 6 of 6

Thread: BW and Concordance (statistics) question

  1. #1
    Join Date
    Apr 2004
    Posts
    120

    Question BW and Concordance (statistics) question

    Hello,

    I have a little hurdle I am trying to overcome, hopefully someone can assist me :)

    I am working on a small database for a flashcard program based on the vocabulary list in the back of, A Readers Hebrew-English Lexicon of the Old Testament. I want to list the words by frequency (most common to least common) and I am in need to determine the frequency of certain words (every word that occurs 50x or more in the Hebrew Bible).

    Here is my problem: there are quite a few words that have identical forms but have different meanings. e.g. qr I (to call; ~739x) and qr II (to meet; ~136x). Therefore I wish to keep them as separate entries with their appropriate frequencies (for practical and statistical reasons). My problem is that when you do a search on a root in BW (I have 5.0) you get a count of all the forms, not just the word you are checking. Examples that are giving me a headache:


    la @a ta bAj ary alm ~[ hn[ arq h[r [r ~v

    Some of these, like m (m/am), can be resolved pretty easily by using the morphology tags--m is a noun and am is a particle.

    But words like l and qr are proving much more difficult, and since many of the words are from the same part of speech there does not seem to be an easy way to separate them. e.g.

    l
    al (neg particle) 729x
    el- (preposition) 5,518x
    el (nouns, I-V) 237x+

    qr
    qr I (verb) 739x
    qr II (verb) 136x

    In BW, is there a way to do an accurate frequency check on a word that shares a form with other words?

    There are many words in the Hebrew Bible that share similar spellings, one being very frequent and the other(s) being very infrequent, and not being able to separate them artificially inflates the number of occurrences, e.g.

    bt 600 = bt I 587 / bt II 13

    lp 507 = lp I 496 / lp II 11

    tsb 501 = tsb I 487 / tsb II 14

    Also, being able to search a specific word (and not every word spelled the same) has advantages for study also. While there are times when you may want to look at every example of qr, there are many times you will only want to examine qr I and having to weed through the 136 entries of qr II can be time consuming. If you only want to study l V (god), having an artificially high number of hits and then having to weed through entries can cost time. So there is a practical purpose to my request for help :)

    I know that the Stongs encoded versions allow you to make a distinction (e.g. qr I = H7121, qr II = H7122), but I am thinking the Hebrew and Greek databases should allow us to make these distinctions also. I am very hesitant to use the Strongs encoded databases because it does not appear to be accurate due to how the system works (not a BW problem). The below example has two parts: the first part is the number of appearances of qr in the MT (from a vocabulary building book) compared to the number BW shows when you do a search on the Strongs number:

    (1) qr I 739x vs. H7121 689x

    (2) qr II 136x vs. H7122 16x

    875x (for the MT) compared to 805x (Strongs in BW) is a big difference. I did search in the WTM database for the root qr and got 876x, so the BW WTM database and the list I have for the MT are almost identical and the Strongs # search is significantly off.

    Ironically, I have never invested in a good Heb. or Greek concordance because of BW :) I have a few vocabulary guides, but I would prefer not to use their statistical information because I plan on making the list freely available on the net. One of the vocabulary guides I have is in the process of making a retail vocabulary program and I would not feel right distributing something for free that utilizes information that they worked hard to create that could possibly compete with them (and therefore possibly hurt their sales).

    If by the off chance this is not possible with BW, if someone could point me in the right direction on where I can obtain this information (e.g. a Hebrew concordance that lists words with the same spelling in separate entries) it would be appreciated!

    One last question: In WTM I can do a search for all verbs that appear in the Hifl:

    .*@vh*

    This provides a lot of good statistical information, but one thing I cannot find is the number of verbal roots that appear in the Hifl without manually counting--this could be handy information. Am I just missing where this information is or is it not currently available?

    Thanks for your time Joshua Luna

  2. #2
    Join Date
    Mar 2004
    Posts
    344

    Default

    Quote Originally Posted by Joshua Luna
    In BW, is there a way to do an accurate frequency check on a word that shares a form with other words?
    The Hebrew morphology contains homonym tags. For example each of the different words spelled arq are tagged with different homonym markers (see the manual for details). I just looked quickly and I see two nouns tagged Ha and Hb and two verbs tagged Ha and Hb. A quick search for arq@v*Ha* will give you the stats on the first verbal homonym, etc.

    Quote Originally Posted by Joshua Luna
    I am very hesitant to use the Strongs encoded databases because it does not appear to be accurate due to how the system works (not a BW problem).
    You are correct. The Strongs tagged translations (KJV, NAS, etc) will not give you accurate statistics regarding the original text. Thats what you use the original text for.

    Quote Originally Posted by Joshua Luna
    One last question: In WTM I can do a search for all verbs that appear in the Hifl:

    .*@vh*

    This provides a lot of good statistical information, but one thing I cannot find is the number of verbal roots that appear in the Hifl without manually counting...

    1. Perform the search and then open the Word List Manager (WLM).
    2. Click Load or Generate Word List.
    3. Under Source choose Load highlighted words from last query. Make sure the Keep morph codes check box is not checked.
    4. Click Create list.

    You now have a list of each verb that occurs in the hiphil (486 total). You can sort them alphabetically or by frequency and you can save the list as an inclusion/exclusion list for use in the ASE.

    This is just one of the many great uses of the WLM. You will be well rewarded if you spend some time familiarizing yourself with the WLM.



  3. #3
    Join Date
    Apr 2004
    Posts
    120

    Default

    Wow Charlie! That is exactly the information I needed--Thanks for the reply!

    You are correct about the WLM, I have been using it more and more. Recently, I manually made some vocabulary lists and copied info manually from TWOT for each word... when I found out the WLM would compile a Lexicon--with your choice of reference(s)--I smacked my forhead and smiled when I learned this It would have saved me a LOT of time.

    I only skimmed my v.5 manual (I had been using 3.5 for quite a few years) and it seems I have missed quite a few very important peices of information... I think I need to go back and read the manual! Thanks for your time, the great post, and the excellent product!

    Joshua

  4. #4
    Join Date
    Apr 2004
    Posts
    120

    Talking Creating a list of all Hebrew lexemes that appear 50x in the BHS

    Hello,

    I have a new project and hope you can help

    I am creating a web based Hebrew/Greek flashcard training site. The general goal of the site for each is to bring a user up to speed in their vocabulary to make "Reader's Lexicons" like the ones offered by Busby-Armstrong-Carr (Hebrew) and Kubo (Greek) more approachable. The goal, for Hebrew, is to have every word that appears 50x. I would like to do this in order of frequency, and have learning units of 20 words (~35 lists) and reviews units of 100 words (~7 lists).

    Now "The Reader's Hebrew-English Lexicon of the Old Testament" has vocabulary lists in the appendix (and use BDB for definitions). So I have the correct vocabulary on needs to learn to make the book useful and definitions that I can use (BDB is out of copyright).

    The problem is the frequencies. Most vocabulary guides that list in order have a copyright and prevent any reproduction of their book. I got the idea of using Bibleworks from "The Vocabulary Guide to Biblical Hebrew" as in the preface they note they derived their frequencies from using Accordance. Surely BW can do that same

    But I am having a tough time getting the results I am after. Based on the works by Practico, Landes, Mitchel, etc... the top 10 words should be something like thus:

    vav (conjunction),
    he (definite article),
    lamed (preposition),
    bet (preposition),
    'et (definite direct object market),
    min (preposition),
    YHWH (noun, divine name Yahweh),
    `al (preposition),
    'el (preposition),
    'asher (relative pronoun),
    kol (noun, all, each, every)

    So that should give you an idea of what I am trying to do

    The script is completed and I have the plan of attack down. So the final part is the frequencies so I can list the words correctly.

    So in BW (version 5) I am trying to:

    -List all unique lexemes (root word, all conjugations, distinction between homonyms)
    -In order of most common to least common, for all words that appear 50x or more in the Hebrew Bible

  5. #5

    Default Wlm

    If I understand what you are looking for (which sounds like a very helpful project) here is what I would do:

    1. Open the World List Manager

    2. Click on Load or Generate Word List

    3. Select WTM as your version

    4. Set Source = "Load words from Bible Version"

    5. Range should be set to include entire OT

    6. Filter Words should be set to * only

    7. Keep Morph codes should be UNchecked

    8. Keep Greek accents and Hebrew Vowel point should be CHECKED

    9. Click create list.

    10. Once the list is created click sort - Sort by Frequency

    11. Click on File - Export as RTF

    You can then open the exproted file in Word or something and you will have all the lemmas and their frequencies.

    You will see that your top 10 list is correct.

    Hope this helps.
    Joe Fleener

    jfleener@digitalexegesis.com
    Home Page: www.digitalexegesis.com
    Blog: http://emethaletheia.blogspot.com/

    Annotated Bibliography of Online Research Tools: www.digitalexegesis.com/bibliography

    User Created BibleWorks Modules: www.digitalexegesis.com/bibleworks



    Psalm 46:11
    `#r<a'(B' ~Wra' ~yIAGB; ~Wra' ~yhi_l{a/ ykinOa'-yKi W[d>W WPr>h;

  6. #6
    Join Date
    Apr 2004
    Posts
    120

    Default

    Thanks Joe and Charlie!

    I should have taken Charlie's advice to play with the WLM more (been busy...)

    Joe: Your advice produced what I wanted -- with one small issue: Homonymns. But based on the information Charlie gave earlier and your very detailed helpful steps I was able first generate all the words, and then make a secondardy list of all words that have at least 1 homonymn (Hb).

    There are only 37 words that have at least one homonymn that appear ~50x or more--these means I need only do manual checking on these words That means I only to manually search those 37 words to find the frequencies for Ha, Hb, Hc, etc... and then re-order them and list only the occurances that appear 50x or more

    If anyone is interested, Joe gave the first search. The second search, based on the info Charlie provided, was:

    *@*Hb*

    Now don't I feel like a bonehead for not figuring that out myself!

    Again, thank you Joe for your speedy and spot on assistance! I am hoping this project will be a blessing to others. The project I am working on is PHP. I had considered Flash because you avoid the font issues, but then run into the hurdle of some people have it and others do not. I am not proficient at working with Flash with MySQL, so I am using a modified PHP script.

    I mention this because look what I found last night:

    http://home.earthlink.net/~vikn/

    John Allred has made a REALLY nice Flash based flash card program. Some of the really nice features I found were

    - You can mix multiple flash card "stacks"
    - You can either take quizes, practice, use it like a flash card, or preview the word list for study
    - Multiple choice and/or fill in the blank options (seems to use require only part of a definition to work also! So on longer words you need only give the base definition)
    - Words are broken up by frequency into numerous groups => nouns, verbs, types of verbs (strong, hallow, etc), numbers, pronouns, pronominal suffixes

    This tool is for Hebrew and Greek. Requires Flash Player 6.

    A very nice, powerful tool. I am affraid my effort wont be quite as flexible, but I hope to offer a similar array of lists and helps.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •