Originally Posted by acheung
I'm doing this with New Testament texts, so the the same thing in a Hebrew text may yield results that are a little different. Here's what I've done to build a list from Romans, excluding proper names from a search:
1. In the context tab, right click on "Export list to Word list Manager"; this will send the list to the "Main word list" (i.e., left-hand column).
2. With the Word List Manager (WLM) still open, select BGM as your search text and limit your search to Romans. Then type in the command line the following: .*@n???p (The "p" at the end is for "proper name"). Type enter. This will give you a list of all proper names in Romans.
3. In the WLM, select "Secondary word list", then "load or create word list"
4. In the new window that opens, select BGM as your search version, then "load highlighted words from last query" (as said in a previous post, I take the precaution of selecting "use search window limits"). Then deselect "keep Greek accents and Hebrew vowel points". This seems to be necessary, as the list created from the context tab doesn't have accents (I realized that the hard way! More on that below). Then create list.
5. In the WLM, you now have all the words in Romans in your Main word list, and all the proper names in Romans in your Secondary word list.
6. Select "Main Word List" (radial button), then click on "select" => "select words common to both lists"
7. Then "Edit" => "delete selected." That will give you your list of all the words in Romans in the "Main Word list", minus all the proper names.
8. You can then select the words that occur more than 50 x, and delete them from the list.
A couple things: in between steps 2 and 3, BW will automatically send the results of your query on proper names to the WLM, in the Secondary word list. This will not help, though, because it sends the words with their accents, and the WLM doesn't seem to be able to compare the two lists (i.e., Παυλος and Παῦλος, for instance, are seen as two different words). So you'll have to discount that automatically generated list and proceed to step 3.
One other thing: Did you know that you can also create a lexicon for that list? This can be helpful for learning the rarer words. Go to "File" => "make lexicon from selected words", then follow the steps there.
I'd be interested in hearing if this works as well in Hebrew texts (WTM). Please let me know!
Originally Posted by Donald Cobb
Thanks for the very helpful advice with clear steps. The method also works well for Hebrew (with the appropriate changes from BNM to WTM and .*@n???p to .*@np*). I did need to modify step 8 as the frequencies of words shown are those peculiar to that book and may not be representative of the NT or OT as a whole (e.g. a common NT word may occur rarely in Romans). The modified steps are:
8a. clear the secondary list
8b. type l [enter] to remove limit to the book, then .*@* to generate a word list for the whole NT BNM or OT WTM
8c. save the list for later use, name it , say, BNM word frequency list, and from it create lists for desired excluded frequencies by deleting words that are less than X. This will leave words occurring more than X times in NT/OT on the secondary word list.
8d. Select "Main Word List" (radial button), then click on "select" => "select words common to both lists"
8e. Then "Edit" => "delete selected." That will give the list of all the words in e.g. Romans in the "Main Word list", minus all the proper names and words occurring more than X times.
With Isaiah as my selected book and X =50, the results are:
1291 verses, 2068 unique words, 23248 total words, total unique words minus proper nouns and 50+ words = 2946 kind of more difficult words, yielding a ratio of 12.7% against the total number of words
For Ruth, the numbers are, respectively, 81, 301, 1823, and 125, yielding a ratio of 6.9%, showing that it is a much easier book to read than Isaiah.
For Job, the ratio is 16.1%, somewhat more difficult than Isaiah, a result that is expected.
For ezekiel, the ratio is 8.4%, between Ruth and Isaiah, again as expected. But it is good to be able to quantify the difficulty level with an objective measure like this.