att'n computer geeks

Textkit is a learning community- introduce yourself here. Use the Open Board to introduce yourself, chat about off-topic issues and get to know each other.
Post Reply
adz000
Textkit Member
Posts: 162
Joined: Mon May 19, 2003 9:45 pm
Location: Cantabrigiae Massachusettensium

att'n computer geeks

Post by adz000 »

Hi,

Now and again I see posts on textkit by people who clearly have a lot of experience with computers, detailing this or that project they're interested in doing. I have a bit of experience myself, though I'm somewhat out of practice. I wanted to list some projects that interest me and which I think represent some ways in which computers in particular can advance our understanding of these dead languages. I'd like feedback on everything.

1) Collaborative commentaries on texts. I know this has begun in some places (notably the Suidas at http://www.stoa.org/sol/ and there are also many texts collaboratively translated like the Homeric scholia and St. Jerome's Chronicon). I think that much more can be done especially with more mainstream literature. The chief advantages in collaborative commentaries are pretty apparent so I won't dwell on them at length; suffice it to say that the project would be great for commentator and reader alike, not to mention fun for all of us who are doing dead languages outside of an academic setting and still have bits and pieces to contribute.

I would like some kind of standard interface that would allow for communal annotation and be adaptable to many different texts. I imagine that every aspect of the text ought to be up-for-grabs including the apparatus criticus and I hope that a certain amount of energy will even go into textual criticism. Different levels of scholarly attention can be maintained by having general editors who have the power to vet entries and reformulate them as they wish.

2) Prosody. I'd like a program that can scan Latin and Greek texts, both poetry and prose, at a sophisticated level and that can provide information on specific lines as well as overall statistics on usage. Corpus computing can automate counts like these, which used to be hand-done, so I'm surprised there haven't been any public-domain attempts to do so. Because of the breadth of texts available, we can get more precise (and on-the-fly) information about specific authors' usage. The lengths of the vowels in most words can be found by looking it up in Perseus, and the rest can be generated by following a relatively simple set of rules such as found for Latin in Gildersleeve's section on prosody in his grammar. The variety and license of Greek verse makes this a more daunting prospect in that language, but perhaps a more fulfilling one as well. I'm particularly interested in using a Latin tool to get a more complete picture of Latin prose usage.

3) Morphological analysis. Perseus has done this, but the code isn't freely available (or is it?). This is a rather off-the-wall idea, but using some kind of natural language processing techniques we could have a program "learn" Latin/Greek by making successive queries to the Perseus Morpheus engine. The traffic generated might make the folks at Perseus upset at us, but it would be an interesting experiment in language programming. But the real advantage would be having a local program that can parse Greek/Latin. Eventually we could use this to add search features that neither Perseus nor TLL interfaces have; namely to do more sophisticated syntactical searches. "Find me every [likely] instance in which this aorist passive verb takes a direct object and is also coupled with this preposition taking the dative case" etc.

Cheers,
Adam

chad
Textkit Zealot
Posts: 757
Joined: Tue Jul 22, 2003 2:55 am

Post by chad »

hi adam, i think it'd be really hard to automate scansion of greek texts. there are lots of factors involved: exactly the same syllables are be scanned differently in different lines, authors and genres. e.g. sometimes internal correption applies and sometimes it doesn't. epic correption doesn't apply in drama, or only v rarely, and a short vowel with a mute + liquid sometimes makes a short syllable, sometimes long... scansion of a syll. depends on e.g.

the position in the foot;
whether it's in iambics, anapaests, lyric;
poetic genre (epic, drama, lyric);
in the context of drama, tragedy or comedy (the same syllables can be scanned differently in these 2 types; &c &c.

The automation of scansion provided by Perseus is the best way way to start, but you need to know a lot about the other factors involved to tell whether you're doing it right in each case. i'm just thinking out loud because i scan all the texts i read, and while before i used perseus and the rules of prosody (when beginning with epic), now i've had to collect concordances of homer, aristophanes, euripides, aeschylus &c to know i'm scanning properly. i need to check words i've scanned before because they scan differently in different situations; i don't know if an algorithm or program could cover it, although i guess because people can complete the job software could do it too, i don't know anything about these computer contraptions nowadays :)

adz000
Textkit Member
Posts: 162
Joined: Mon May 19, 2003 9:45 pm
Location: Cantabrigiae Massachusettensium

Post by adz000 »

Thanks for the reply!

Greek verse would be a challenge, you're right. I suppose I'd do best to imagine Latin as a starting point and Greek as a distant endpoint. The regularity of Latin scansion and their scholastic interpretation of Greek meter makes me think that Latin would be a lot easier to do (and if we subtract Horace, we're really only talking about hexameters, pentameters, and iambic trimeters). Of course there will probably be stumbling blocks; it might be good to skip Plautus entirely, of course then the usefulness would decrease accordingly. But there's so much information available, there must be some good way to take advantage of it! I don't have in mind a contraption to make the lives of school-boys easier, but hopefully something that would be able to tell us things we didn't know about how verse usage varies. Now we do know a lot about the usage of the chief authors (Homer comes to mind), but some sort of program of general application might be a desideratum.

chad
Textkit Zealot
Posts: 757
Joined: Tue Jul 22, 2003 2:55 am

Post by chad »

hi adam, it's funny you should mention that, i was just talking to will last week about writing an article giving the scansion and commentary for the opening iambic speeches of the great greek dramatists, showing how sophocles and aeschylus resolve longs rarely and rarely break the iambic pattern giving a graver sound; euripides resolves far more often giving his dramas the a quicker and more intense tragic emotion; aristophanes' iambics are full of resolutions and also anapaests in any foot except the last (tragedy can't have an anapaest in the 2nd or 4th foot)... this gives aristophanes the most natural-sounding language of the dramatists i think, you can really hear the difference between these 3 types, even though they were all iambic dramatists writing roughly around the same time. i'm too busy unfortunately to write it though, it's just all written up in my OCTs at home... oh well no great loss :)

if someone figured out how to program an automatic scanner i'd be very appreciative :)

annis
Textkit Zealot
Posts: 3399
Joined: Fri Jan 03, 2003 4:55 pm
Location: Madison, WI, USA
Contact:

Re: att'n computer geeks

Post by annis »

adz000 wrote:1) Collaborative commentaries on texts. I know this has begun in some places (notably the Suidas at http://www.stoa.org/sol/ and there are also many texts collaboratively translated like the Homeric scholia and St. Jerome's Chronicon). I think that much more can be done especially with more mainstream literature. The chief advantages in collaborative commentaries are pretty apparent so I won't dwell on them at length;
I have from time to time thought about doing something like this on Aoidoi.org. The very simplest option would be to use a Wiki, but that's not ideal. The new layout in the Perseus beta is I think the best way to organize comments on a text, and I know no Wiki that would cope with that well.

One thing I like about the Suda software is that it understands that some people are experts. The usual Wiki has no conception of that, and this can be a problem. I have no idea what sort of email Jeff gets, but given the historical role of Greece and Rome in most of the Western World's conception of itself, there are a lot of people with some amazing theories about Greek and Latin. And then there are the "All Languages are Decended from My Country's Major Language" people (Armenian, Turkish and Greek are popular candidates for The Mother of All Languages right now). These people send me email sometimes. They should not be doing commentaries on Homer. You address this below, but I wanted to make clear that loose access control could be a real problem.
I would like some kind of standard interface that would allow for communal annotation and be adaptable to many different texts. I imagine that every aspect of the text ought to be up-for-grabs including the apparatus criticus and I hope that a certain amount of energy will even go into textual criticism.
People would need to be able to switch off bits they don't want to see.

Doing an app crit would be a huge undertaking. I shudder to think what the TEI DTD for this looks like.

I think it would be very useful for beginners to be able to add questions to a line, which could then be the basis for a commentary note.
Different levels of scholarly attention can be maintained by having general editors who have the power to vet entries and reformulate them as they wish.
Absolutely. Emphasis on the plural: editors.
3) Morphological analysis. Perseus has done this, but the code isn't freely available (or is it?).
Even if that software is, morphological analysis isn't much use without a dictionary, and I'm sure the dictionaries are not free.
but using some kind of natural language processing techniques we could have a program "learn" Latin/Greek by making successive queries to the Perseus Morpheus engine. The traffic generated might make the folks at Perseus upset at us,
That would surely make some poor Unix sysadmin very angry. In general, one wants to avoid that.
William S. Annis — http://www.aoidoi.org/http://www.scholiastae.org/
τίς πατέρ' αἰνήσει εἰ μὴ κακοδαίμονες υἱοί;

User avatar
klewlis
Global Moderator
Posts: 1668
Joined: Tue Jul 29, 2003 1:48 pm
Location: Vancouver, Canada
Contact:

Post by klewlis »

"Find me every [likely] instance in which this aorist passive verb takes a direct object and is also coupled with this preposition taking the dative case" etc.
A program like this exists for biblical greek. It is called Gramcord and is widely used among biblical scholars. I am told that you can get quite detailed in the searches.

I happen to have the Logos software for that. It does not do as detailed of searches as Gramcord, but suits my purposes quite well.
First say to yourself what you would be; then do what you need to do. ~Epictetus

Lisa
Textkit Neophyte
Posts: 27
Joined: Thu May 22, 2003 8:38 pm
Location: Somerville, MA
Contact:

Re: att'n computer geeks

Post by Lisa »

adz000 wrote:3) Morphological analysis. Perseus has done this, but the code isn't freely available (or is it?). This is a rather off-the-wall idea, but using some kind of natural language processing techniques we could have a program "learn" Latin/Greek by making successive queries to the Perseus Morpheus engine. The traffic generated might make the folks at Perseus upset at us, but it would be an interesting experiment in language programming. But the real advantage would be having a local program that can parse Greek/Latin. Eventually we could use this to add search features that neither Perseus nor TLL interfaces have; namely to do more sophisticated syntactical searches. "Find me every [likely] instance in which this aorist passive verb takes a direct object and is also coupled with this preposition taking the dative case" etc.
Hi, there's a lot here, but I'll start with the last point.
There are, of course, other projects which have undertaken the disambiguation task (i.e. the Vergil Project at UPenn), and paved the way for community-based editing, which is where we are headed.

We are already working on a system whereby users can disambiguate a text with a voting system. Rather than have the computer try to learn the language by ingesting lots of data, processing it, and spitting it back out again, why not attach more information to the texts and words themselves? (Since Perseus' corpus is limited, I doubt it would well serve as the basis for teaching a computer either Greek or Latin even if we had the unlimited resources to do so.)

We are also working towards user community based commentaries, as was noted, and a Perseus Wiki-interface. This work is underway already. Our current endeavors are all with an eye toward opening up the materials we have in ways which were not possible on the WWW eight years ago.

Best,
Lisa

Democritus
Textkit Fan
Posts: 331
Joined: Fri May 07, 2004 12:14 am
Location: California

Re: att'n computer geeks

Post by Democritus »

adz000 wrote:"Find me every [likely] instance in which this aorist passive verb takes a direct object and is also coupled with this preposition taking the dative case" etc.
Is there any software which can do that for modern languages? That seems like a rather hard problem. Detecting a preposition with a dative case might not be too hard, but reliably detecting a direct object might be more difficult than it seems at first. Lots of deep ambiguity there, and resolving it would require some knowledge of the world, not just of grammar. If both subject and object are neuter singular, the software would have a hard time figuring out which is which. I would be surprised to hear about software which was reliable about parsing on that level. (Somebody surprise me, please.) :)

Does anyone have any idea what is the total data size of the entire classical corpus? I have no idea, but I'm guessing it is small.
Lisa wrote:We are already working on a system whereby users can disambiguate a text with a voting system. Rather than have the computer try to learn the language by ingesting lots of data, processing it, and spitting it back out again, why not attach more information to the texts and words themselves?
I have not studied this problem, so I'm talking out my arse, but I suspect that this human-powered approach might be significantly easier than trying to create software to do the job. The corpus is finite. Covering the whole thing would be a lot of work, but it might turn out to be less work than successfully creating a software parser. The parser would have to be very smart indeed.

One added advantage: if we opt not to create intelligent parsers, then we won't have that pesky problem afterwards of intelligent Latin-speaking robots taking control of the Earth. :)

Kopio
Global Moderator
Posts: 789
Joined: Wed Feb 11, 2004 7:56 pm
Location: Boise, ID

Re: att'n computer geeks

Post by Kopio »

annis wrote:The new layout in the Perseus beta....
Huh?? There's a new beta?? Where can you find it at?? I poked around Perseus a bit and couldn't find anything about it....do you have a link??

Also.....I did find a nice link to Textkit in the Perseus FAQ....thought that was kinda cool!!

Turpissimus
Textkit Enthusiast
Posts: 424
Joined: Thu Jul 15, 2004 12:49 pm
Location: Romford

Post by Turpissimus »

Huh?? There's a new beta?? Where can you find it at?? I poked around Perseus a bit and couldn't find anything about it....do you have a link??
http://test.perseus.tufts.edu/hopper/index.jsp

You really should look inside "outside links of interest" sometime, Kopio. All kinds of fun in there...

Kopio
Global Moderator
Posts: 789
Joined: Wed Feb 11, 2004 7:56 pm
Location: Boise, ID

Post by Kopio »

Thanks Turp! That is kinda nifty.....I'll have to go check out the other links.

Kopio
Global Moderator
Posts: 789
Joined: Wed Feb 11, 2004 7:56 pm
Location: Boise, ID

Post by Kopio »

I must add.....I really like the interface, the font is much cleaner and pleasing to the eye. Makes looking at the Odyssey much more enjoyable (I dislike reading blue hyperlinked font for page after page).

annis
Textkit Zealot
Posts: 3399
Joined: Fri Jan 03, 2003 4:55 pm
Location: Madison, WI, USA
Contact:

Post by annis »

Right this moment, I would be the happiest grammar-nerd in the world if I could tell a computer "tell me all places in Homer where γε follows an adjective."
William S. Annis — http://www.aoidoi.org/http://www.scholiastae.org/
τίς πατέρ' αἰνήσει εἰ μὴ κακοδαίμονες υἱοί;

Post Reply