Searching between the fields in online lexica?

ἑκηβόλος · November 15, 2017, 2:42pm

Is there a way to search for parts of the entries which are neither lemmata nor meanings in an online version of LSJ? I want like to be able to search for something as non-specific as “ii” and to get heaps of nonsensical results. Then perhaps to be able to search within the returned results.

Both Perseus and TLG seem to be too efficient and clean in their categorisation of elements into data fields. I realise that the quality and consistency of data in lexica ranges quite a bit for different pieces of information given in different entries - with lemmata and meanings being the most reliable, but I do also want to search for a few things that do not belong in those two fields too.

Barry_Hofstetter · November 15, 2017, 10:44pm

Would wildcards work, something like ii or perhaps enclosing your search string in quotation marks, “ii”?

jeidsath · November 15, 2017, 11:17pm

Download the full text of the LSJ here and search in your favorite text editor.

polemistes · November 15, 2017, 11:27pm

I get a lot of results when searching for “ii” at TLG’s full corpus textual search. Of course most of the results are inflected and not dictionary forms, and you need to have access to the full corpus search to get it. I don’t know if that would be useful for you? I get things like “διιπετέος”, “διιστᾶι”, “Εὐποιίας” and “Διί”.

ἑκηβόλος · November 16, 2017, 8:13am

An interesting experiment.

In TLG, with out without the wildcard markers, a search still only return ὡροσκόπησις, which apparently is because of corrupt?? / incorrect inclusion of the “II” in the meaning field.

In Perseus, one specifies where in the word a string should occur, rather than using wildcats. If just a double ii is specified, then, only ἀναδέω is returned, presumably also because of misalligned data in this section,

II. ἀναδῆσαι τὴν πατριὴν ἐς ἑκκαιδέκατον θεόν trace one’s family to a god in the sixteenth generation, Hdt.2.143.

In Perseus, with the wildcard option, a search returns Latin genitives in scientific names, etc., ie words with “ii” included in one word of the definition.

Both attempts at wildcards confirms my belief that searching is limited to fields.

Perhaps a clearer example is that in the entry for LSJ, “Corinth” is included in the dictionary entry for ἄναξ, but it will not turn up is the English of the definitions is searched for “Corinth”, because “IG4.236 (Corinth)” is a type of data other than the definitional data.

ἑκηβόλος · November 16, 2017, 8:29am

That search capability of the lemmata is something that I will have to try in the future. Trying it in the public version only reveals data errors in ἑξᾰμηνιαῖος and φρυκτωρός.

Out of curiosity, in the full version is it also possble to search for, say ϝάναξ or Κόρϝα from within their respective entries?

For me, for now, Perseus is capable of a similar search of the lemmata, but in this case, the diairesis on the second iota of the LSJ entries is part if a long-standing indexing problem for characters with the diairesis, between search results and dictionary entries for the Perseus LSJ searching, but not for those entries going to the Middle.

polemistes · November 16, 2017, 1:10pm

I can’t search for ϝ anywhere in TLG.

jeidsath · November 16, 2017, 2:29pm

For the entries with ϝ, see this post: http://discourse.textkit.com/t/in-homer/14956/1

opoudjis · November 30, 2017, 4:35am

I no longer work there, but (a) I could have sworn we mapped ϝ back to Beta code for search (which the texts are underlyingly in); (b) if in doubt, try Beta Code. Digamma in Beta code is V.

opoudjis · November 30, 2017, 4:39am

Indeed: a lot of the ii’s used to denote different meaning fields, for example, were TEI markup in the version of the LSJ that the TLG got from Perseus; and the TLG version reflects years of debugging the markup further, so it would have gotten even more “clean”. (Not perfect still, as Hekebolos reports!)

Joel is right: your best bet is to do a text search through a downloaded version of the marked up LSJ. The Morpheus distro from Perseus should include somewhere their original TEI version; it will not have the TLG corrections and improvements, but it will still have more searchable text than an archive.org OCR.