Code Conversion

amans · Post by **amans** » Mon Jul 04, 2005 8:48 pm

I am working on a project and I must find a way to represent Greek on a web page.

I gather Unicode is the best solution. If understand this system correctly, all characters, ordinary Greek letters as well as all the accented vowels, come in codes resembling this:

& #966;& #952;& #8051;& #947;& #956;& #945; (please ignore the spaces)

Is it correct that web pages with this coding can be universally viewed, even from computers without SP Ionic, Sgreek, and the like?

The Greek I have is either in beta code or "ready made" Greek, from a text editor like Microsoft's Word.

My question is: how do I convert the various formats into Unicode (like the coded sequence above)? I have tried Sean Redmond's converter but it does not seem to work very well with accented capitals and final sigmas, for instance...

Thanks for any comments or suggestions...

Paul · Post by **Paul** » Mon Jul 04, 2005 9:09 pm

Hi amans,

Unicode is a character set. In its commonest form, every character is represented by two bytes of data. Two bytes can hold 65,536 distinct values. Hence Unicode can represent this many characters.

Different ranges, e.g., different parts of the range 0-65535 have been allocated to represent different languages. Greek uses the ranges 880-1023 and 7936-8191.

It is important to distinguish between a character and a glyph. Characters are numbers, as just described. A glyph is the look or shape of a character.

Whether or not a web page can be 'universally viewed' depends on the the font the web browser is using. If the font is a 'unicode font', then you should be able to see the proper glyphs for any unicode character. A unicode font is simply a font that can map every unicode character to its corresponding glyph. More simply, a unicode font 'knows' what glyph to display for every character in the Unicode character set.

What formats are you trying to convert to Unicode? If you need Betacode converted to Unicode, I can handle that for you easily.

You can also obtain Tavultesoft's Keyman program. This is keyboard mapping software that allows you to enter data directly in Unicode.

With respect to 'final sigmas': are you feeding Sean's converter proper betacode or the odd variant we use here? Letter 'j' is not final sigma in proper betacode.

I hope this helps. If not, please ask again.

Cordially,

Paul

amans · Post by **amans** » Tue Jul 05, 2005 2:26 pm

Hi Paul

Thanks a bunch for your reply. I think I understand the distinction between code and glyph: I see glyphs on my monitor, but to the computer they are merely codes or characters, right?

Perhaps I should have stated my problem this way instead: I would like to display ancient Greek on a web page I am working on (I am working in simple HTML: I am new to this . . . ). How should I go about it, if I want to be viewable from computers with many different configurations ("universally viewable")?

The text I want could be anything Greek: e.g. I have some Anakreontea in a book I'd like to type up. Or it could be assignments or anything really.

I can write in Greek with Microsoft's Word. I can shift the keyboard and with the proper font I can choose all the diacritical marks from the Symbol Box. I already have the Keyman from Tavultesoft. It works very well, too. Or, I can write in beta code . . .

I have, btw, noticed that I can see the Greek at Bibliotheca Augustana from a pc without SP Ionic and the like.

You are right regarding the final sigma. My "beta code" has j's. What is the proper beta code for this? Also: do you have any idea why Sean's converter doesn't handle the accented capitals very well?

Thanks again

adz000 · Post by **adz000** » Tue Jul 05, 2005 4:02 pm

There is also the Greek transcoder (http://www.greektranscoder.org/) which is some sort of Word macro for converting between different Greek fonts. I haven't ever used it, but perhaps someone else has?

Paul · Post by **Paul** » Tue Jul 05, 2005 6:57 pm

amans wrote:I think I understand the distinction between code and glyph: I see glyphs on my monitor, but to the computer they are merely codes or characters, right?

Right.

amans wrote: I can write in Greek with Microsoft's Word. I can shift the keyboard and with the proper font I can choose all the diacritical marks from the Symbol Box. I already have the Keyman from Tavultesoft. It works very well, too. Or, I can write in beta code . . .

It sounds like you should have no difficulty creating an HTML file that contains Unicode. But having saved the Unicode file, do be careful to open and save it again only with a Unicode-aware text editor, e.g., Notepad, MS Word, etc.

amans wrote: I have, btw, noticed that I can see the Greek at Bibliotheca Augustana from a pc without SP Ionic and the like.

Bibliotheca Augustana uses Unicode. Spionic is not a Unicode font. It was designed expressly to map Betacode (which is ASCII) to the appropriate glyphs.

amans wrote: You are right regarding the final sigma. My "beta code" has j's. What is the proper beta code for this? Also: do you have any idea why Sean's converter doesn't handle the accented capitals very well?

Proper betacode for medial and final sigma is simply 's'. It is the converter's responsibility to distinguish sigmas and show the proper glyph. Strictly speaking, betacode does support s1 for medial, s2 for final. But I wouldn't count on a converter to handle this.

Cordially,

Paul