PDA

View Full Version : BW and Concordance (statistics) question



Joshua Luna
05-01-2004, 05:24 PM
Hello,

I have a little hurdle I am trying to overcome, hopefully someone can assist me :)

I am working on a small database for a flashcard program based on the vocabulary list in the back of, “A Reader’s Hebrew-English Lexicon of the Old Testament”. I want to list the words by frequency (most common to least common) and I am in need to determine the frequency of certain words (every word that occurs 50x or more in the Hebrew Bible).

Here is my problem: there are quite a few words that have identical forms but have different meanings. e.g. qr’ I (to call; ~739x) and qr’ II (to meet; ~136x). Therefore I wish to keep them as separate entries with their appropriate frequencies (for practical and statistical reasons). My problem is that when you do a search on a root in BW (I have 5.0) you get a count of all the forms, not just the word you are checking. Examples that are giving me a headache:


la @a ta bAj ary alm ~[ hn[ arq h[r [r ~v

Some of these, like šm (šęm/šam), can be resolved pretty easily by using the morphology tags--šęm is a noun and šam is a particle.

But words like ’l and qr’ are proving much more difficult, and since many of the words are from the same part of speech there does not seem to be an easy way to separate them. e.g.

’l
’al (neg particle) 729x
’el- (preposition) 5,518x
’el (nouns, I-V) 237x+

qr’
qârâ’ I (verb) 739x
qârâ’ II (verb) 136x

In BW, is there a way to do an accurate frequency check on a word that shares a form with other words?

There are many words in the Hebrew Bible that share similar spellings, one being very frequent and the other(s) being very infrequent, and not being able to separate them artificially inflates the number of occurrences, e.g.

bt 600 = bt I 587 / bt II 13

’lp 507 = ’lp I 496 / ’lp II 11

tsb’ 501 = tsb’ I 487 / tsb’ II 14

Also, being able to search a specific word (and not every word spelled the same) has advantages for study also. While there are times when you may want to look at every example of qârâ’, there are many times you will only want to examine qârâ’ I and having to weed through the 136 entries of qârâ’ II can be time consuming. If you only want to study ’l V (god), having an artificially high number of “hits” and then having to weed through entries can cost time. So there is a practical purpose to my request for help :)

I know that the Stong’s encoded versions allow you to make a distinction (e.g. qârâ’ I = H7121, qârâ’ II = H7122), but I am thinking the Hebrew and Greek databases should allow us to make these distinctions also. I am very hesitant to use the Strong’s encoded databases because it does not appear to be accurate due to how the system works (not a BW problem). The below example has two parts: the first part is the number of appearances of qr’ in the MT (from a vocabulary building book) compared to the number BW shows when you do a search on the Strong’s number:

(1) qr’ I 739x vs. H7121 689x

(2) qr’ II 136x vs. H7122 16x

875x (for the MT) compared to 805x (Strong’s in BW) is a big difference. I did search in the WTM database for the root qr’ and got 876x, so the BW WTM database and the list I have for the MT are almost identical and the Strong’s # search is significantly off.

Ironically, I have never invested in a good Heb. or Greek concordance because of BW :) I have a few vocabulary guides, but I would prefer not to use their statistical information because I plan on making the list freely available on the net. One of the vocabulary guides I have is in the process of making a retail vocabulary program and I would not feel right distributing something for free that utilizes information that they worked hard to create that could possibly compete with them (and therefore possibly hurt their sales).

If by the off chance this is not possible with BW, if someone could point me in the right direction on where I can obtain this information (e.g. a Hebrew concordance that lists words with the same spelling in separate entries) it would be appreciated!

One last question: In WTM I can do a search for all verbs that appear in the Hif‘îl:

.*@vh*

This provides a lot of good statistical information, but one thing I cannot find is the number of verbal roots that appear in the Hif‘îl without manually counting--this could be handy information. Am I just missing where this information is or is it not currently available?

Thanks for your time – Joshua Luna

Charlie
05-10-2004, 10:59 AM
In BW, is there a way to do an accurate frequency check on a word that shares a form with other words?The Hebrew morphology contains homonym tags. For example each of the different words spelled arq are tagged with different homonym markers (see the manual for details). I just looked quickly and I see two nouns tagged Ha and Hb and two verbs tagged Ha and Hb. A quick search for arq@v*Ha* will give you the stats on the first verbal homonym, etc.



I am very hesitant to use the Strong’s encoded databases because it does not appear to be accurate due to how the system works (not a BW problem).

You are correct. The Strong’s tagged translations (KJV, NAS, etc) will not give you accurate statistics regarding the original text. That’s what you use the original text for.



One last question: In WTM I can do a search for all verbs that appear in the Hif‘îl:

.*@vh*

This provides a lot of good statistical information, but one thing I cannot find is the number of verbal roots that appear in the Hif‘îl without manually counting...


1. Perform the search and then open the Word List Manager (WLM).
2. Click “Load or Generate Word List.”
3. Under “Source” choose “Load highlighted words from last query.” Make sure the “Keep morph codes” check box is not checked.
4. Click “Create list.”

You now have a list of each verb that occurs in the hiphil (486 total). You can sort them alphabetically or by frequency and you can save the list as an “inclusion/exclusion list” for use in the ASE.

This is just one of the many great uses of the WLM. You will be well rewarded if you spend some time familiarizing yourself with the WLM.

Joshua Luna
05-11-2004, 02:07 PM
Wow Charlie! That is exactly the information I needed--Thanks for the reply!

You are correct about the WLM, I have been using it more and more. Recently, I manually made some vocabulary lists and copied info manually from TWOT for each word... when I found out the WLM would compile a Lexicon--with your choice of reference(s)--I smacked my forhead and smiled when I learned this :) It would have saved me a LOT of time.

I only skimmed my v.5 manual (I had been using 3.5 for quite a few years) and it seems I have missed quite a few very important peices of information... I think I need to go back and read the manual! Thanks for your time, the great post, and the excellent product!

Joshua

Joshua Luna
08-22-2005, 08:38 PM
Hello,

I have a new project and hope you can help :D

I am creating a web based Hebrew/Greek flashcard training site. The general goal of the site for each is to bring a user up to speed in their vocabulary to make "Reader's Lexicons" like the ones offered by Busby-Armstrong-Carr (Hebrew) and Kubo (Greek) more approachable. The goal, for Hebrew, is to have every word that appears 50x. I would like to do this in order of frequency, and have learning units of 20 words (~35 lists) and reviews units of 100 words (~7 lists).

Now "The Reader's Hebrew-English Lexicon of the Old Testament" has vocabulary lists in the appendix (and use BDB for definitions). So I have the correct vocabulary on needs to learn to make the book useful and definitions that I can use (BDB is out of copyright).

The problem is the frequencies. Most vocabulary guides that list in order have a copyright and prevent any reproduction of their book. I got the idea of using Bibleworks from "The Vocabulary Guide to Biblical Hebrew" as in the preface they note they derived their frequencies from using Accordance. Surely BW can do that same :)

But I am having a tough time getting the results I am after. Based on the works by Practico, Landes, Mitchel, etc... the top 10 words should be something like thus:

vav (conjunction),
he (definite article),
lamed (preposition),
bet (preposition),
'et (definite direct object market),
min (preposition),
YHWH (noun, divine name Yahweh),
`al (preposition),
'el (preposition),
'asher (relative pronoun),
kol (noun, all, each, every)

So that should give you an idea of what I am trying to do :)

The script is completed and I have the plan of attack down. So the final part is the frequencies so I can list the words correctly.

So in BW (version 5) I am trying to:

-List all unique lexemes (root word, all conjugations, distinction between homonyms)
-In order of most common to least common, for all words that appear 50x or more in the Hebrew Bible

Joe Fleener
08-23-2005, 06:07 AM
If I understand what you are looking for (which sounds like a very helpful project) here is what I would do:

1. Open the World List Manager

2. Click on Load or Generate Word List

3. Select WTM as your version

4. Set Source = "Load words from Bible Version"

5. Range should be set to include entire OT

6. Filter Words should be set to * only

7. Keep Morph codes should be UNchecked

8. Keep Greek accents and Hebrew Vowel point should be CHECKED

9. Click create list.

10. Once the list is created click sort - Sort by Frequency

11. Click on File - Export as RTF

You can then open the exproted file in Word or something and you will have all the lemmas and their frequencies.

You will see that your top 10 list is correct.

Hope this helps.

Joshua Luna
08-23-2005, 02:01 PM
Thanks Joe and Charlie!

:o I should have taken Charlie's advice to play with the WLM more (been busy...)

Joe: Your advice produced what I wanted -- with one small issue: Homonymns. But based on the information Charlie gave earlier and your very detailed helpful steps I was able first generate all the words, and then make a secondardy list of all words that have at least 1 homonymn (Hb).

There are only 37 words that have at least one homonymn that appear ~50x or more--these means I need only do manual checking on these words :) That means I only to manually search those 37 words to find the frequencies for Ha, Hb, Hc, etc... and then re-order them and list only the occurances that appear 50x or more :D

If anyone is interested, Joe gave the first search. The second search, based on the info Charlie provided, was:

*@*Hb*

Now don't I feel like a bonehead for not figuring that out myself! :D

Again, thank you Joe for your speedy and spot on assistance! I am hoping this project will be a blessing to others. The project I am working on is PHP. I had considered Flash because you avoid the font issues, but then run into the hurdle of some people have it and others do not. I am not proficient at working with Flash with MySQL, so I am using a modified PHP script.

I mention this because look what I found last night:

http://home.earthlink.net/~vikn/

John Allred has made a REALLY nice Flash based flash card program. Some of the really nice features I found were

- You can mix multiple flash card "stacks"
- You can either take quizes, practice, use it like a flash card, or preview the word list for study
- Multiple choice and/or fill in the blank options (seems to use require only part of a definition to work also! So on longer words you need only give the base definition)
- Words are broken up by frequency into numerous groups => nouns, verbs, types of verbs (strong, hallow, etc), numbers, pronouns, pronominal suffixes

This tool is for Hebrew and Greek. Requires Flash Player 6.

A very nice, powerful tool. I am affraid my effort wont be quite as flexible, but I hope to offer a similar array of lists and helps.