I work for a scanning/imaging and data mining software company, we have several programs that are excellent at converting paper documents into searchable pdf's as well as mining through the data and extracting all pertinent metadata, including the body text, and writing it to a sql back-end. The challenge here is to take the existing pdf files and convert them into a searchable pdf, meaning OCR the images in the pdf to actual text.
I looked at the pdf's from the Seforim website, they are in fairly bad condition for OCR'ing, but not a hopeless case. It's quite possible to OCR them. I can only promise to give it a shot next week and keep you posted on my progress.
If this fails, is there any way to get an actual copy of the Ginsburg Massorah?
Michael, I tried the module you complied, but it failed to show up. I am running BW 6.0.012y. I looked for it in the "Resources" column on the menu bar, but did not see it in there. Is this where it is suppose to be?
Last edited by ugotdave; 02-10-2007 at 04:03 PM.
Be diligent to present yourself approved to God
as a workman who does not need to be ashamed,
handling accurately the word of truth.