View Full Version : Creating non-latin script versions

MEJ Buijs
03-17-2007, 11:46 AM
As I have mentioned before I am thinking about patching together a BibleWorks version of Saaida Gaon's arabic translation (in Hebrew characters). Having looked through some help files, however, I'm not quite sure how I would create a text in Hebrew characters as a plain text file for the compiler.

I glanced at some of the wonderful work Michael Hanel has done - kol ha-kavod for which! - and got an idea of how it would work with a Greek text; but have no idea how to proceed with right-to-left scripts like Hebrew, Syriac or Arabic. Could anyone give me some advice or point me to a FAQ on this?

I'm also very happy with Brian Beers's efforts to bring us the Qur'an in Shakir's translation; it's made me think of the possibility of making an Arabic Qur'an, as well. That would be a more complicated task yet, though, because it would require full Arabic vowel pointing as well.


- Martijn Buijs

04-13-2008, 06:25 AM
An arabic qur'an would be nice and it would be even nicer to have it morphologically tagged...

04-14-2008, 03:51 PM
The best thing you could do would be to export a few verses of the WTT or some other Hebrew version and see how it is encoded. You can import Hebrew with the CCAT transliteration (http://ccat.sas.upenn.edu/gopher/text/religion/biblical/parallel/00.betacode.txt) scheme (a.k.a. "beta code") as well (I think this would be the easiest). In CCAT, Genesis 1:1-4 would look like the following:
Gen 1:1 B_R)$YT BR) )LHYM )T H_$MYM W_)T H_)RC
Gen 1:4 W_YR) )LHYM )T-_H_)WR KY-_+WB W_YBDL )LHYM BYN H_)WR W_BYN H_X$KThis would compile correctly in "CCAT" format (see the Version Database Compiler help files). I exported the text using the export tool (under the "Tools" menu) from the WTT, and I saved it in "CCAT" format. Once I exported it, I then opened the resultant file with a text editor to take a look. It's helpful to re-save it as a "txt" file or rename it with a "txt" extension for future use.

Note the following:

Each verse is on its own line and is preceded by the three-letter book name abbreviation, a space, then the chapter number, a colon and the verse number followed by a space (e.g. Gen 1:1).
In CCAT encoding you do not have to worry about getting the text in right-to-left formatting. The Hebrew characters are "transliterated" in regular ascii type, from left-to-right.
) = aleph
( = ayin
+ = tet
X = chet
& = sin
$ = shin
# = unpointed sin/shin
Underscores mark breaks between prefixes and words. This is helpful if a morphological text is produced as well, but it is not necessary for a simple non-tagged version.

04-14-2008, 05:17 PM
Hi Martijn,

May I ask --- why in Hebrew characters?
Why not in Arabic, fully vocalized?

What I would like to mention is that Saadia Gaon's translation of the Old Testament is going to be included in the Polyglot Bible project of Bibles.org.uk. But we need people to help out entering the text (all the stuff required for this will be provided for free, of course).

You can download the sample of the Polyglot Bible from here:


We use ArabTeX for typesetting the Arabic column; this implies a simple ASCII transliteration for both the consonants and vowels. One can then use this source code to convert to the BibleWorks (or any other) encoding trivially.

So, I see no need for two people doing essentially the same thing. Help us out and you will be killing two birds with one stone. Needless to say, Polyglot Bible will be made available for free in PDF format and for printing-cost-only (no royalties/profit) in printed form when it is ready. So, the more helpers I get the sooner the result will be ready for everyone to enjoy.

I haven't officially announced the Polyglot Bible project yet because it is still in the very early stage. So this is the first place it is mentioned publically (historians, note carefully and make no mistakes when you write about it! :)

Shalom uvrakha
Tigran Aivazian

10-15-2009, 11:50 PM
Ive been working on modifying an Arabic Qur'an text myself, and think I have the text file in the correct form for the compiler. The major difficulty I am having is that while BibleWorks 8 apparantly has the capability of displaying unicode Arabic text (as represented by the Van Dyke version of the Arabic Bible) and while it also has the ability to display English versions of the Qur'an; user-created versions still have to be reformatted in CCAT.

I was able to download the macro for unicode -> CCAT, but unfortunately while I can import the text into Word and run the macro; Word won't allow the document to be saved in that format (it prompts me to resave in unicode, which defeats the whole purpose of running the macro.)

Any suggestions?