I invite everyone to test my latest creation, a new Latin Macronizer:
http://stp.lingfil.uu.se/~jowi4905/macronizer/
Apart from marking long vowels, it also has the ability to convert the orthography to use v and/or j, if so desired.
This project was at least partly inspired by Felipe Vogel’s Māccer program, which has two drawbacks, as I see it, which I wanted to overcome: the reliance on macronized texts for training, and the fact that it doesn’t (as far as I understand) take the context of an ambiguous word into consideration. My macronizer instead gets information about vowel lengths from a morphological analyzer (Morpheus, of the Perseus Project). To choose between ambiguous forms, the text is tagged with a parts-of-speech (POS) tagger, trained on the Latin Dependency Treebank.
Any suggestions for improvements are very welcome. I’m especially interested in reports of faults that are due to errors in the Morpheus lexicon. Short of true artificial intelligence, it is of course unreasonable to expect a perfect result from the macronizer, so if you find that it in one place mixes up, say, mīseris and miserīs, that is probably what you should expect, and there is little I can do about it. If, however, you notice a macronized word form which shouldn’t exist, or which is very rare and thus highly improbable compared to the correct form, then please report it. (For example, I just today noticed that paria is often erroneously marked as parīa, which is an alternative form of parēas, a kind of snake! That will be corrected.)