
Originally Posted by
bobvenem
I've downloaded roughly 1600 volumes from Archive.org, all of them in .pdf format (EXTREMELY large, photographic reproductions).
I considered what you suggest, but found the problem with their .txt files is that they are direct OCR files from the photographic reproductions, crinkled pages, discolorations and all. Consequently, the .txt files contain a huge amount of garbage characters, and would take an immense amount of work to make them into BW7 databases. (Not having worked with HTML files that much, I can't speak to that issue).
As a result I decided to download the much larger .pdf files and, with Adobe Acrobat, add bookmarks to the volumes. I have tried to create workable text files from these, and it can be done (even from Acrobat Reader 8 using cut and paste), but this is also a long process (more Wycliffe, anyone?)