PDA

View Full Version : Word occurrence vs. LCS?



Michael Burer
07-15-2009, 07:19 PM
I am having trouble understanding the difference between the "word occurrence" and LCS methods in the text comparison settings. Here's what the help file says:


The LCS method is the most commonly used, and you should probably use it unless you have reasons to do otherwise. LCS stands for "Least Common Substring." LCS finds the set of common words with the least total length. The other method, the "Word Occurrence" method, finds words that occur in only one of the verses being compared.

I think I understand the differences here, but I don't see practically how the results would change. Can anyone offer an explanation or some examples which would show how these two methods change the way texts are compared?

Thanks!

Glenn Weaver
07-16-2009, 09:52 AM
Hi Mike,

Probably the biggest practical difference is that some words will not appear as differences if you select the Word Occurrence comparison method.

For example, compare the BGT and SCR in John 1:18. The article 'O' before MONOGENHS in the SCR appears as a difference when the LCS method is used, but does not appear as a difference when the Word Occurrence method is used. The difference does not appear for the Word Occurrence method because the article 'O' appears elsewhere in the verse.

The Word Occurrence method will only highlight words that do not appear in the version(s) being compared. In the same verse, QEOS is highlighted in the BGT (since this word does not appear in the same verse in the SCR), and UIOS is highlighted in the SCR (since this word does not appear in the same verse in the BGT).

Word order is a part of the LCS method, but is not a part of the Word Occurrence method. For example, compare the BGT and SCR in Rom 1:1. The BGT has CRISTOU IHSOU, while the SCR has IHSOU CRISTOU. No words are highlighted when using the Word Occurrence method, since each word in each version appears in the other version.

When using the LCS method, the word highlighted in each version is CRISTOU. (If you entered the SCR first in the Text Comparison Settings Tool, then IHSOU is the highlighted word, since it appears first in the SCR text.) When using the LCS method, BibleWorks recognizes that there is a difference in the string of words, and so the text is highlighted. Since CRISTOU in the BGT does not appear in the SCR when moving character-by-character through the text, this appears as a difference. And after IHSOU in the SCR, the word CRISTOU appears but is not in the BGT (since the word already appeared before IHSOU in the BGT), so this is marked as a difference.

Here are some practical uses for each method. It depends upon what you want to find.

--If you want to see only the words that do not appear at all in the compared versions in that verse, the Word Occurrence method is the one to use.

--If you want to see differences where the same word may appear elsewhere in the verse in the compared versions, then the LCS Method is the one to use.

--If you want to see differences in word order between compared versions in that verse, then the LCS Method is the one to use.

ISalzman
07-16-2009, 10:28 AM
Great answer, Glenn. Thanks!

SCSaunders
07-16-2009, 11:14 AM
Great answer, Glenn. Thanks!ditto.

Good question with a very practical value, good answer to match.

Do I actually, completely understand the good answer? Maybe in time, when my heads stops hurting from trying to wrap itself around it. Hurts like a sunburn and I'm bald.

At any rate, it's amazing what the BW programmers can code into the software and that users are seeking to master it.

Ken Neighoff
07-16-2009, 11:18 AM
The best way that I have found to understand the answer to the question, is recreate the steps in BW.

Nothing like hands on training.

Precha1
07-16-2009, 11:47 AM
The best way that I have found to understand the answer to the question, is recreate the steps in BW.

Nothing like hands on training.

This method has helped me learn BW and gain insight to Inductive Bible Study.

MBushell
07-16-2009, 12:09 PM
ditto.

Good question with a very practical value, good answer to match.

Do I actually, completely understand the good answer? Maybe in time, when my heads stops hurting from trying to wrap itself around it. Hurts like a sunburn and I'm bald.

At any rate, it's amazing what the BW programmers can code into the software and that users are seeking to master it.

The simplest way to look at the LCS method is this: if you compare two versions with the LCS method, deleting the highlighted parts will result in the two verses being exactly the same. And it guarantees that the number of such deleted words will be a minimum (there are sometimes different ways to produce identical non-highlighted subsets).

Glenn's discussion was very good. Do some sample searches and it will make sense.

Mike

SCSaunders
07-16-2009, 03:28 PM
Thanks one and all. Your input is appreciated.

FWIW, I just got back from seeing Transformers w. my dad. A bit long, but the FX are amazing. If you've got some time on your hands ....