Textkit Logo

Adding long vowels to Whitaker's Words

Textkit is a learning community- introduce yourself here. Use the Open Board to introduce yourself, chat about off-topic issues and get to know each other.

Moderators: thesaurus, Jeff Tirey

Adding long vowels to Whitaker's Words

Postby furrykef » Thu May 27, 2010 3:52 pm

Whitaker's Words is a popular English<->Latin dictionary. (If you just want to try out the dictionary online, not download it, you can do so here.) The main reason I don't use it too much is because it doesn't mark vowel length, and I mark vowel length obsessively when I make flash cards out of Latin sentences. But I've gotten tired of consulting a physical copy of Cassell's every time I just want to know which vowels are long, and since Whitaker's Words is a dictionary, and as far as I can see there is nothing preventing us (from either a legal or technical standpoint) from modifying the dictionary, it seems to me that we ought to try to fix it.

As for the scale of the project, there are approximately 39,000 words in the dictionary according to the website. (It said this of a slightly out-of-date version of the dictionary, but I'm sure the next couple of releases haven't made it significantly larger.) I think, as a group project, adding vowel length for that many entries is within reason, no?

As for where to get the vowel length information from, it should be simple to use the dictionaries that many of us already have. This shouldn't constitute copyright infringement, because a compilation of publicly available information cannot be fully copyrighted. The presentation of the information -- the exact wording and such -- can be copyrighted, but the raw information, such as which vowels are long, cannot.

There is a bit of a problem, however: people don't always agree on which vowels are long. For example, Wheelock and the Oxford Latin Desk Dictionary have "adulēscēns"; Cassell's has "adulescens" with no long vowels. (In fact, Cassell's seems to never mark length on vowels when they are followed by two consonants.) At least one Latin book I've seen -- A&G's New Latin Grammar, available on this site -- says that it's "māgnus", not "magnus", but no other book I've seen has said so, and it's quite old, so maybe general opinion has changed. (Lingua Latina did use "māgnus", in the form "māgnīne", on page 15, line 57, but I assume it was a typo, since it's "magnus" everywhere else.) I'm not fully sure how to resolve this.

Another obstacle is, of course, that humans are not perfect and may make a typo when copying. Thus, some safeguard needs to be implemented, which probably means multiple people doing the same work (so we can spot potential mistakes when two people disagree).

So what do you guys think?
Founder of Learning Languages Through Video Games.
I also have a lang-8 journal where I practice Spanish and Japanese.
User avatar
furrykef
Textkit Enthusiast
 
Posts: 365
Joined: Sun Feb 07, 2010 7:18 am

Re: Adding long vowels to Whitaker's Words

Postby furrykef » Thu May 27, 2010 4:00 pm

Funny enough, right after I posted this I found that Lewis & Short can be consulted online, and it does mark for vowel length (though it seems to follow the same convention as Cassell's of not marking length at all on heavy syllables). This does, I admit, significantly reduce the need for such a project. However, it also means that, if it's possible to get ahold of the L&S database itself, it would be almost trivial to extract the vowel length information and put it into Whitaker's Words automatically, and Whitaker's Words is still useful if you want a quick-and-dirty gloss rather than detailed entries. Hmm...
Founder of Learning Languages Through Video Games.
I also have a lang-8 journal where I practice Spanish and Japanese.
User avatar
furrykef
Textkit Enthusiast
 
Posts: 365
Joined: Sun Feb 07, 2010 7:18 am

Re: Adding long vowels to Whitaker's Words

Postby Hampie » Thu May 27, 2010 10:00 pm

Perseus Hopper is open soruce, so if you’re computer savy you can download the entire site form the site itself and install in on your own server. It would be amazing to have something as fast and simple as Witakers’s word with vowel length—Perseus Hopper is horribly slow.
Här kan jag i alla fall skriva på svenska, eller hur?
User avatar
Hampie
Textkit Member
 
Posts: 176
Joined: Tue Nov 07, 2006 10:51 pm
Location: Holmia, Suecia

Re: Adding long vowels to Whitaker's Words

Postby furrykef » Thu May 27, 2010 10:44 pm

Heh, you know what would be even more amazing? Being able to consult Whitaker's Words just by mousing over Latin text in your web browser. There's already a Mozilla Firefox extension that does this for Japanese (Rikaichan); doing the same thing for another language should be pretty simple.

In fact I should probably do that first before messing around with the vowel length thing, since it'd likely be quicker to do and its overall utility would be far greater.
Founder of Learning Languages Through Video Games.
I also have a lang-8 journal where I practice Spanish and Japanese.
User avatar
furrykef
Textkit Enthusiast
 
Posts: 365
Joined: Sun Feb 07, 2010 7:18 am

Re: Adding long vowels to Whitaker's Words

Postby edonnelly » Fri May 28, 2010 12:00 am

furrykef wrote:Heh, you know what would be even more amazing? Being able to consult Whitaker's Words just by mousing over Latin text in your web browser.


You can easily do essentially this with Diogenes and Firefx. Diogenes is a lightning fast program that gives you a working implementation of the Perseus interface to both the LSJ (for ancient Greek) and Lewis & Short (for Latin). It has macrons all marked and it parses the word for you. I have my installation autostart when the computer turns on and it runs in the system tray. I then created a custom searchplugin for Firefox (basically just put the following code I wrote into a file called diogenes.xml into the firefox searchplugins folder):

Code: Select all
<SearchPlugin xmlns="http://www.mozilla.org/2006/browser/search/" xmlns:os="http://a9.com/-/spec/opensearch/1.1/">
<os:ShortName>Diogenes</os:ShortName>
<os:Description>Morphological search for Latin and Greek words</os:Description>
<os:InputEncoding>UTF-8</os:InputEncoding>
<os:Image width="16" height="16">data:image/x-icon;base64,R0lGODdhIAAgAIABAADv7////ywAAAAAIAAgAAACWYyPqcvtDwOIFIFZ6c3xYu54H6iII2mJaDmp67G5rwR782bI6GmveP0DnSQ94ZB47Bx1lSRT2Xg+krkiNBSEZFnbqNaKVVIT41SXXAamTew2GO2Ov2f0Or0AADs=</os:Image>
<os:Url type="text/html" method="POST" template="http://localhost:8888/Diogenes.cgi">
<os:Param name="JumpTo" value=""/>
<os:Param name="FontName" value=""/>
<os:Param name="action" value="parse"/>
<os:Param name="corpus" value="TLG+Texts"/>
<os:Param name="query" value="{searchTerms}"/>
<os:Param name="go" value="Go"/>
<os:Param name="greek_output_formatXXstate" value="UTF-8"/>
<os:Param name="current_pageXXstate" value="splash"/>
</os:Url>
</SearchPlugin>


(the above code assume you will run diogenes as a local program on port 8888, which I believe is the default setting).

I then choose Diogenes as my default search engine, and anytime I highlight a Greek or Latin word in firefox, I can right-click and a new tab opens up with full parsing information about the word and the full dictionary entry (entries) for the word. I wouldn't actually want it to work with a mouseover, though that would probably be fairly easy to do with a greasemonkey script. I think it would actually be annoying, but might be fun to do just for the heck of it.

Diogenes is infinitely better than Whitakers for everything except going from English to Greek or Latin, which it does not do. It's entirely free and open source: http://www.dur.ac.uk/p.j.heslin/Software/Diogenes/



I posted all this once before (I guess in the now-missing outside links forum), and the original probably had better instructions. Unfortunately I use Firefox so rarely anymore (now a Chrome guy) that some of the details may have slipped from my mind, but this all will get you extremely close, if not there entirely. I copied the code from the original that I wrote, so it should work just fine. I think a restart of firefox was all that was needed once I put the text file I created into the correct folder.
The lists:
G'Oogle and the Internet Pharrchive - 1100 or so free Latin and Greek books.
DownLOEBables - Free books from the Loeb Classical Library
User avatar
edonnelly
Textkit Zealot
 
Posts: 959
Joined: Sun Jan 16, 2005 2:47 am
Location: Music City, USA

Re: Adding long vowels to Whitaker's Words

Postby furrykef » Fri May 28, 2010 4:09 am

edonnelly wrote:I wouldn't actually want it to work with a mouseover, though that would probably be fairly easy to do with a greasemonkey script. I think it would actually be annoying, but might be fun to do just for the heck of it.


I'd never be able to read any of the Roman authors by doing it your way -- not at my current level of knowledge -- because there are far too many words I don't know. You gotta think of the people like me who are reaosnably well acquainted with basic Latin grammar but who have a relatively tiny Latin vocabulary.

I've never found Rikaichan remotely annoying, myself, so I don't really see why it'd be annoying here.


edonnelly wrote:Diogenes is infinitely better than Whitakers for everything except going from English to Greek or Latin


Does doing it your way mean you'll only see glosses the way Whitaker's does? Because the idea here is to jump from the Latin word to its English meaning as fast as possible, meaning that cutting out 'noise' (anything that isn't a simple definition) is important. If the gloss isn't adequate, you can always look it up in a 'real' dictionary.
Founder of Learning Languages Through Video Games.
I also have a lang-8 journal where I practice Spanish and Japanese.
User avatar
furrykef
Textkit Enthusiast
 
Posts: 365
Joined: Sun Feb 07, 2010 7:18 am

Re: Adding long vowels to Whitaker's Words

Postby edonnelly » Fri May 28, 2010 5:18 am

furrykef wrote:Does doing it your way mean you'll only see glosses the way Whitaker's does? Because the idea here is to jump from the Latin word to its English meaning as fast as possible, meaning that cutting out 'noise' (anything that isn't a simple definition) is important. If the gloss isn't adequate, you can always look it up in a 'real' dictionary.


I don't know what to tell you but to just try it. If your definition of better is "most similar to Whitaker's" then you will probably be disappointed, but you can test it out by replacing the template="..." part in my code above with a working online diogenes server, such as Will's (http://aoidoi.org/diogenes/Diogenes.cgi). It will be slower than having a local version, and obviously will only work with an internet connection, but it will save you the trouble of downloading and installing diogenes while still letting you see exactly how the instant search will work.
The lists:
G'Oogle and the Internet Pharrchive - 1100 or so free Latin and Greek books.
DownLOEBables - Free books from the Loeb Classical Library
User avatar
edonnelly
Textkit Zealot
 
Posts: 959
Joined: Sun Jan 16, 2005 2:47 am
Location: Music City, USA


Return to Open Board

Who is online

Users browsing this forum: No registered users and 25 guests