Joshua Luna
05-01-2004, 05:24 PM
Hello,
I have a little hurdle I am trying to overcome, hopefully someone can assist me :)
I am working on a small database for a flashcard program based on the vocabulary list in the back of, “A Reader’s Hebrew-English Lexicon of the Old Testament”. I want to list the words by frequency (most common to least common) and I am in need to determine the frequency of certain words (every word that occurs 50x or more in the Hebrew Bible).
Here is my problem: there are quite a few words that have identical forms but have different meanings. e.g. qr’ I (to call; ~739x) and qr’ II (to meet; ~136x). Therefore I wish to keep them as separate entries with their appropriate frequencies (for practical and statistical reasons). My problem is that when you do a search on a root in BW (I have 5.0) you get a count of all the forms, not just the word you are checking. Examples that are giving me a headache:
la @a ta bAj ary alm ~[ hn[ arq h[r [r ~v
Some of these, like šm (šęm/šam), can be resolved pretty easily by using the morphology tags--šęm is a noun and šam is a particle.
But words like ’l and qr’ are proving much more difficult, and since many of the words are from the same part of speech there does not seem to be an easy way to separate them. e.g.
’l
’al (neg particle) 729x
’el- (preposition) 5,518x
’el (nouns, I-V) 237x+
qr’
qârâ’ I (verb) 739x
qârâ’ II (verb) 136x
In BW, is there a way to do an accurate frequency check on a word that shares a form with other words?
There are many words in the Hebrew Bible that share similar spellings, one being very frequent and the other(s) being very infrequent, and not being able to separate them artificially inflates the number of occurrences, e.g.
bt 600 = bt I 587 / bt II 13
’lp 507 = ’lp I 496 / ’lp II 11
tsb’ 501 = tsb’ I 487 / tsb’ II 14
Also, being able to search a specific word (and not every word spelled the same) has advantages for study also. While there are times when you may want to look at every example of qârâ’, there are many times you will only want to examine qârâ’ I and having to weed through the 136 entries of qârâ’ II can be time consuming. If you only want to study ’l V (god), having an artificially high number of “hits” and then having to weed through entries can cost time. So there is a practical purpose to my request for help :)
I know that the Stong’s encoded versions allow you to make a distinction (e.g. qârâ’ I = H7121, qârâ’ II = H7122), but I am thinking the Hebrew and Greek databases should allow us to make these distinctions also. I am very hesitant to use the Strong’s encoded databases because it does not appear to be accurate due to how the system works (not a BW problem). The below example has two parts: the first part is the number of appearances of qr’ in the MT (from a vocabulary building book) compared to the number BW shows when you do a search on the Strong’s number:
(1) qr’ I 739x vs. H7121 689x
(2) qr’ II 136x vs. H7122 16x
875x (for the MT) compared to 805x (Strong’s in BW) is a big difference. I did search in the WTM database for the root qr’ and got 876x, so the BW WTM database and the list I have for the MT are almost identical and the Strong’s # search is significantly off.
Ironically, I have never invested in a good Heb. or Greek concordance because of BW :) I have a few vocabulary guides, but I would prefer not to use their statistical information because I plan on making the list freely available on the net. One of the vocabulary guides I have is in the process of making a retail vocabulary program and I would not feel right distributing something for free that utilizes information that they worked hard to create that could possibly compete with them (and therefore possibly hurt their sales).
If by the off chance this is not possible with BW, if someone could point me in the right direction on where I can obtain this information (e.g. a Hebrew concordance that lists words with the same spelling in separate entries) it would be appreciated!
One last question: In WTM I can do a search for all verbs that appear in the Hif‘îl:
.*@vh*
This provides a lot of good statistical information, but one thing I cannot find is the number of verbal roots that appear in the Hif‘îl without manually counting--this could be handy information. Am I just missing where this information is or is it not currently available?
Thanks for your time – Joshua Luna
I have a little hurdle I am trying to overcome, hopefully someone can assist me :)
I am working on a small database for a flashcard program based on the vocabulary list in the back of, “A Reader’s Hebrew-English Lexicon of the Old Testament”. I want to list the words by frequency (most common to least common) and I am in need to determine the frequency of certain words (every word that occurs 50x or more in the Hebrew Bible).
Here is my problem: there are quite a few words that have identical forms but have different meanings. e.g. qr’ I (to call; ~739x) and qr’ II (to meet; ~136x). Therefore I wish to keep them as separate entries with their appropriate frequencies (for practical and statistical reasons). My problem is that when you do a search on a root in BW (I have 5.0) you get a count of all the forms, not just the word you are checking. Examples that are giving me a headache:
la @a ta bAj ary alm ~[ hn[ arq h[r [r ~v
Some of these, like šm (šęm/šam), can be resolved pretty easily by using the morphology tags--šęm is a noun and šam is a particle.
But words like ’l and qr’ are proving much more difficult, and since many of the words are from the same part of speech there does not seem to be an easy way to separate them. e.g.
’l
’al (neg particle) 729x
’el- (preposition) 5,518x
’el (nouns, I-V) 237x+
qr’
qârâ’ I (verb) 739x
qârâ’ II (verb) 136x
In BW, is there a way to do an accurate frequency check on a word that shares a form with other words?
There are many words in the Hebrew Bible that share similar spellings, one being very frequent and the other(s) being very infrequent, and not being able to separate them artificially inflates the number of occurrences, e.g.
bt 600 = bt I 587 / bt II 13
’lp 507 = ’lp I 496 / ’lp II 11
tsb’ 501 = tsb’ I 487 / tsb’ II 14
Also, being able to search a specific word (and not every word spelled the same) has advantages for study also. While there are times when you may want to look at every example of qârâ’, there are many times you will only want to examine qârâ’ I and having to weed through the 136 entries of qârâ’ II can be time consuming. If you only want to study ’l V (god), having an artificially high number of “hits” and then having to weed through entries can cost time. So there is a practical purpose to my request for help :)
I know that the Stong’s encoded versions allow you to make a distinction (e.g. qârâ’ I = H7121, qârâ’ II = H7122), but I am thinking the Hebrew and Greek databases should allow us to make these distinctions also. I am very hesitant to use the Strong’s encoded databases because it does not appear to be accurate due to how the system works (not a BW problem). The below example has two parts: the first part is the number of appearances of qr’ in the MT (from a vocabulary building book) compared to the number BW shows when you do a search on the Strong’s number:
(1) qr’ I 739x vs. H7121 689x
(2) qr’ II 136x vs. H7122 16x
875x (for the MT) compared to 805x (Strong’s in BW) is a big difference. I did search in the WTM database for the root qr’ and got 876x, so the BW WTM database and the list I have for the MT are almost identical and the Strong’s # search is significantly off.
Ironically, I have never invested in a good Heb. or Greek concordance because of BW :) I have a few vocabulary guides, but I would prefer not to use their statistical information because I plan on making the list freely available on the net. One of the vocabulary guides I have is in the process of making a retail vocabulary program and I would not feel right distributing something for free that utilizes information that they worked hard to create that could possibly compete with them (and therefore possibly hurt their sales).
If by the off chance this is not possible with BW, if someone could point me in the right direction on where I can obtain this information (e.g. a Hebrew concordance that lists words with the same spelling in separate entries) it would be appreciated!
One last question: In WTM I can do a search for all verbs that appear in the Hif‘îl:
.*@vh*
This provides a lot of good statistical information, but one thing I cannot find is the number of verbal roots that appear in the Hif‘îl without manually counting--this could be handy information. Am I just missing where this information is or is it not currently available?
Thanks for your time – Joshua Luna