I recently purchased ReadIris Pro 12 (Build 5644). I got it for $65 from the normal price of $129 because of an HP deal from my printer. It took a little while to figure out the interface (I think OmniPage Pro has a little more depth in regard to saving the layout, etc. But Abbott-Smith’s lexicon is a very simple format (A title header and the text body).
I am using a pdf version of Abbott-Smith’s 1922 lexicon. I have two pdf versions, one I purchased which is clear (no bleeding of the text as is common in some old books). This version however has one clear page (white background) and on page which is off-colored (sort of an ivory). Kind of odd for a pdf. The other pdf file is the one from Google Books. The background color for all pages is white, but the text is bleeding.
I used Adobe Acrobat Pro 9 to extract each page as a separate pdf and then tried to open it with Readiris. ReadIris did not like the ivory background pages. I had to do some tweaking with resolution settings to get the text to show - otherwise it opened up the image as a blank page. I never got the ivory pages to open directly with text from the pdf file. I had to convert them to tiff files first and then ReadIris opened them fine.
After configuring a new dictionary to use for scanning, I began to train the software. One of the problems of Abbott-Smith is that it also contains Hebrew characters. ReadIris has a middle-eastern version; but the version I am using did not support Hebrew. It did manage to learn some of the characters but continues to be problematic with the diacriticals. The bleeding text was too difficult to train the software with, so I gave up on Google Book’s copy.
One of the problems with ReadIris is that while the image window for for the ‘unknown text’ is very very large, the space for the character which you can change is extremely small - perhaps a font size 8, leaving Greek and Hebrew diacritics minuscule. It is almost impossible to determine what ReadIris thought the character was or what you typed in. ReadIris returns either a single character or mix of characters thinking it either 1) knows it, e.g. ‘ρ’ or if it is wrong ‘p’ or 2) does not know it ’ ’ (blank)) Some double glyphs were returned and with the correct suggestions; those old Latin double letters like ff AE etc. almost always came back correct.
There were several letters in Abbott-Smith which are composed of two parts (h, w, n). ReadIris never got these correct and treated them as several letters - I did not have a choice to merge two glyphs into a single glyph. I had to hope that occasionally ReadIris would catch one and give me both parts of the same letter at once, which it did once in a while. But it still never learned them. This should be a feature which should be added to the software.
ReadIris was very accurate with numbers. Abbott-Smith uses a mixture of large numbers for the chapters and raised smaller numbers for the verse. These were almost always correct - although sometimes it would keep them as raised and other times turn them into large numbers. Some of the Greek fonts for alpha and its combinations ἀ ἁ ἂ ἃ ἆ ᾶ etc. would end up being font 22, while others would turn into font size 10 - I could not figure this out - (I used the Save as Word option) and it was in the Word document where this happened.
As far as learning Greek, ReadIris got the simple letters correct, almost always got the vowels with the acute accents correct, but could not get a handle on the different diacriticals that appear over alpha, even after being trained numerous times (I went through about 6 pages of Abbott-Smith character by character).
So I plan on doing a little more training, but am going to take a look act Tessaract when I get a chance.
Louis