I’m working on an open-source hobby project to try to present ancient Greek texts in innovative ways to help readers. As part of this project, it would be helpful if I could find machine-readable files, in the pubic domain or under an open-source license, that give possible English translations of Greek words. Failing that, it would help if I could find machine-readable files containing dictionary definitions. When I say machine-readable, I mean something formatted for use by a machine, not just a web page such as the markup of LSJ et al. at logeion.uchicago.edu. Does anyone know of any resource of this type? I’m especially interested in doing the Homeric vocabulary, but, e.g., koine would be helpful too.
As an example, for a word like ξίφος, I would like to have something like this:
ξίφος sword
Obviously translations are not always one-to-one, and this will not be possible for all words. So for example, we might have:
χιτών tunic coat frock covering
And many words simply won’t have any one-word English equivalents, so, e.g., a word like ἐγγίζω, “bring near,” could be left out completely, given an explanation-style definition, given a multi-word translation, or just marked to show that it has no one-word translation.
I know this all sounds sort of picky, like I’m walking onto a used car lot and demanding a purple 1967 Mustang. Actually I would be happy with any sort of approximation to these requirements if it could save me the work of constructing such a lexicon myself, e.g., by reading LSJ and constructing entries by hand. There are also open-source software libraries designed to parse wiktionary entries, so I could try that in order to save myself at least some labor. However, I really am not interested at all in anything that is not free and open-source. I want to keep my project completely free, and in particular I don’t want to contaminate it with anything that has a license such as CC-BY-NC, with a noncommercial clause.
Currently my main use for these data would not be to present them to humans but rather as a behind-the-scenes tool for me to do statistical analysis. That means that I don’t care too much if the data contain errors or are not authoritative. Even if as many as 50% of the entries were entirely wrong, it would still be usable for my purposes, although obviously not ideal. The current statistical task that I’m trying to accomplish is getting software to automatically associate sentences in Homer with sentences in the various public-domain English translations. I already have this working well for English-to-English, which is obviously easier. That is, my software can figure out pretty reliably that a certain sentence in Pope’s translation of the Iliad corresponds to a certain sentence in the Lang translation.
The best approximation I’ve found so far to what I’m talking about seems to be Ancient Greek Wordnet. There is also something called Open Ancient Greek Wordnet, which seems to be different. However, the original Wordnet and its spin-offs and copycats are not really translating dictionaries. They’re more like maps of conceptual categories and synonyms.
Another pretty interesting resource is Giles, The Odyssey of Homer : construed literally, and word for word, 1800. This is available on archive.org and consists of a Greek paraphrase of Homer (with the word-order mangled to be more like English) and English words interspersed. As an example, the opening is this:
Εννεπε declare μοι to me, Μουσα Muse, ανδρα the man …
I wouldn’t want to experience Homer by reading it in this format, but it does seem to provide a large number of word-to-word translations in Homeric Greek and in machine-readable format. However, trying to put this bilingual text through OCR software to get reasonable results seems like it would be somewhat of a project. (I’ve used the open-source Tesseract OCR system before, and it is supposed to be possible to make it work with polytonic Greek, but it probably isn’t smart enough to tell which words in Giles are Greek and which are English.)