I’ve written a script for adding macrons to Latin texts. It’s not as straightforward as that, of course, but it should be helpful for unambiguous forms at least. More info and download: http://fps-vogel.github.io/tools/ (under “the Macronizer”–if any links are missing let me know, I put up the site just now).
The problem is that I’ve been unsuccessful in finding texts with macrons already, which I need in order to teach the program the word forms. I knew that texts in text-format are basically nonexistent online, but I assumed I could OCR texts in PDF files–but it turns out OCR software can’t handle Latin text with macrons (except possibly FineReader, which is expensive).
Does anyone know of such texts online in text (not PDF) format? If I don’t find anything, I could simply teach the program the forms as I format texts by hand… but lots of manual formatting is what I’m trying to avoid in the first place.
EDIT: If anyone knows of good-quality scans of public-domain texts with accurate marking of vowel length, I would consider working on those as well. I’d actually prefer those, since I could freely share them afterwards.
I decided not to work on LLPSI, since it will require just as much correction as any OCR-ed text, and in the end I wouldn’t be able to share it. But I’m thinking of working on Puer Romanus instead, even though it’s shorter and would require more correction since the PDF (on Archive) is not as good.
Phil I’ve sent you a DM. Chris Francese of Dickinson College sent me a spreadsheet of Henry Frieze’s Vergilian dictionary, whose macrons were OCR’s and hand corrected by Derek Frymark (column D).
He adds that Derek is also working on OCR macron correction for the first six Books of the Aeneid, based on Knapp’s edition. He may have some preliminary versions to share with you already.
Thanks, Paul. Did you mean that you sent me a PM? If so, it seems I didn’t receive it. I’ll get in touch with Dr. Francese for sure. I had never heard of Dickinson College. It looks like he’s doing good work over there–that series of online texts is great.
This is a great idea, but a warning - You need to be very careful with any macronised text you find - unfortunately, they are all - and I have seen and read just about every one currently scanned by google etc - rife with errors.
There is one massive source of error - the confusion between syllables long by position - marked as such in the Gradus Ad Parnassum for metrical purposes - and a macron that marks pronounced vowel length - i.e. syllables long by nature. Add to that the disputes that reign over words with hidden quantity.
Any macronised text you use to ‘teach’ a macroniser program, will need to be gone over and checked meticulously, a word at a time, otherwise you will get a classic case of ‘garbage in - garbage out’.
To add to the confusion, almost every macronised text I have seen is internally inconsistent in its application of macrons.
Thanks, Evan, for the words of caution. I’ve seen these unfortunate complications for myself in the past few weeks as I’ve been gathering these texts. I’m working on ways to counteract them, mainly:
(1) a script that helps in applying hidden macrons to a text, using lots of find-and-replace strings, i.e. “stell” → “stēll” for “stēlla”, etc. (but these would have to be checked afterwards, as for example “pactus” can have a long “a” or not, depending on what verb it is derived from), and
(2) detailed records produced when the program “learns” words from a text: all new words similar to known words but with different vowel lengths (e.g. “comīs” and “cōmis”) will be listed so that the user can check and if necessary correct them. (Alternate spellings complicate things, but at least the simple differences such as intervocalic i/j and “cu”/“quu” can be handled automatically.)
I didn’t know what I was getting myself into, but I hope the end result will be time-saving at least.
Thanks, pmda. Yes, that is a good resource for the Aeneid.
Way back when I was working on this tool, shortly afterward Johan Winge made his own macronizer which works beautifully: http://stp.lingfil.uu.se/~winge/macronizer/index.py. After I saw it, there was no reason to continue work on mine.