This past week I did try out Anagnostis 4.1 on the apparatus criticus for the Goettingen
Septuaginta of Obadiah. Anagnostis has Ancient Greek, Modern Greek, and Greek/English modes, so it was worth a try. There are also some tricks of the trade floating around as lore in the UPenn Center for Computer Analysis of Texts. I hoped that some combination of software provisions and lore would allow passable OCR of the AppCrit. Here is what I found:
First of all, I have to qualify my findings by this: I was using the crippleware trial version of Anagnostis 4.1. It does not allow for saving even intermediate files. As it turned out, it did not allow me to save a training file I constructed myself, that included some CCAT tricks, like making substitute readings for the Gothic MT, or the critical apparatus siglum that's a circle with a dot inside. If I could have saved a first-pass training file, then revised it in subsequent runs, I might have produced a passable reading of the app crit in stages. The stages I envisioned were:
1) Read the Greek, skip the English words, correct the numbers -- SAVE the training file
2) Read, using Greek/English, the English words in the same file -- SAVE the same training file.
3) Run the resulting training file for OCR and correct the results. -- SAVE the results
4) Possibly darken or correct the underlying text to enhance OCR and SAVE.
When CCAT used one of the original Kurzweil scanners, it was possible to train it, through iterations and cleverness, to read Hebrew and Greek texts as Michigan-Claremont and Beta code. What they did was train the scanner/OCR to misread the Hebrew or Greek as their closest English (Latin) letters in appearance, then run search-and-replace programs to substitute other English (Latin) letters that actually represented the underlying Greek or Hebrew text.
It would be nice to be able to do that more directly in Unicode. I think the combination of Ancient Greek and Latin letters in the apparatus would defeat the old CCAT approach, though.
Anagnostis *looks* like a very sophisticated program. And, there's no way of knowing without being able to save intermittent files -- say for two days -- to completely evaluate the usability of Anagnostis for one's scholarly purposes. What I could tell was that Anagnostis 4.1 could do a passable job of distinguishing Ancient Greek from Arabic numerals.
Sigrid Peterson
CCAT/CATSS Variants Project
petersig@ccat.sas.upenn.edu