The texts from Perseus Library

Textkit is a learning community- introduce yourself here. Use the Open Board to introduce yourself, chat about off-topic issues and get to know each other.
Post Reply
Gergian
Textkit Neophyte
Posts: 34
Joined: Wed Jan 03, 2018 6:50 pm

The texts from Perseus Library

Post by Gergian »

Perseus offers almost all of his texts for download in ( http://www.perseus.tufts.edu/hopper/opensource/download ), but is in HTML format. Does someone here know a easy way (maybe a software, a site etc...) to convert the HTML format to a readable format ?

User avatar
bedwere
Global Moderator
Posts: 5098
Joined: Fri Mar 07, 2008 10:23 pm
Location: Didacopoli in California
Contact:

Re: The texts from Perseus Library

Post by bedwere »

Maybe the easiest thing for you is to open an html file with your favorite browser and then copy and paste.

Gergian
Textkit Neophyte
Posts: 34
Joined: Wed Jan 03, 2018 6:50 pm

Re: The texts from Perseus Library

Post by Gergian »

But If I do this, they open the HTML code, not the text... Ok, let leave the "easy" way, but I need a HTML editor or something like that?

User avatar
bedwere
Global Moderator
Posts: 5098
Joined: Fri Mar 07, 2008 10:23 pm
Location: Didacopoli in California
Contact:

Re: The texts from Perseus Library

Post by bedwere »

I think they are XML, not HTML files. Maybe someone else knows more.

User avatar
jeidsath
Textkit Zealot
Posts: 5325
Joined: Mon Dec 30, 2013 2:42 pm
Location: Γαλεήπολις, Οὐισκόνσιν

Re: The texts from Perseus Library

Post by jeidsath »

Yes, they are XML. You need to use a programming language with an XML parser Python, Java, etc., to get them to spit out the text. And even then, you'll have to do a fair amount of custom coding.
“One might get one’s Greek from the very lips of Homer and Plato." "In which case they would certainly plough you for the Little-go. The German scholars have improved Greek so much.”

Joel Eidsath -- jeidsath@gmail.com

User avatar
ἑκηβόλος
Textkit Zealot
Posts: 969
Joined: Wed Aug 07, 2013 10:19 am
Contact:

Re: The texts from Perseus Library

Post by ἑκηβόλος »

On a related issue...

The text of the very first section if Daphnis and Chloe can not be seen as a single section on Perseus:
Longus 1.1.1-2 wrote:[p. 241] κάλλιστον ὧν εἶδον: εἰκόνα, γραφήν, ἱστορίαν ἔρωτος. Καλὸν μὲν καὶ τὸ ἄλσος, πολύδενδρον, ἀνθηρόν, κατάρρυτον: μία πηγὴ πάντα ἔτρεφε, καὶ τὰ ἄνθη καὶ τὰ δένδρα: ἀλλ̓ ἡ γραφὴ τερπνοτέρα καὶ τέχνην ἔχουσα περιττὴν καὶ τύχην ἐρωτικήν: ὥστε πολλοὶ καὶ τῶν ξένων κατὰ φήμην ᾔεσαν, τῶν μὲν Νυμφῶν ἱκέται, τῆς δὲ εἰκόνος θεαταί. [2] Γυναῖκες ἐπ̓ αὐτῆς τίκτουσαι καὶ ἄλλαι σπαργάνοις κοσμοῦσαι: παιδία ἐκκείμενα, ποίμνια τρέφοντα: ποιμένες ἀναιρούμενοι, νέοι συντιθέμενοι: λῃστῶν καταδρομ
The reason that it can't be seen is that there is a line missing in the xml code, viz.

Code: Select all

<milestone unit="section" n="1"/>
It ought to be after the <p> in

Code: Select all

<head>*p*r*o*o*i*m*i*o*n</head>
<p>
<pb id="p.241"/>
Is the Perseus xml user editable? Can I actually do something about that?
τί δὲ ἀγαθὸν τῇ πομφόλυγι συνεστώσῃ ἢ κακὸν διαλυθείσῃ;

User avatar
jeidsath
Textkit Zealot
Posts: 5325
Joined: Mon Dec 30, 2013 2:42 pm
Location: Γαλεήπολις, Οὐισκόνσιν

Re: The texts from Perseus Library

Post by jeidsath »

Email perseus_webmaster@tufts.edu to get things fixed. They've been working on a new viewer for a while now.

You could also submit a PR here, though I don't know their policy on accepting them: https://github.com/PerseusDL
“One might get one’s Greek from the very lips of Homer and Plato." "In which case they would certainly plough you for the Little-go. The German scholars have improved Greek so much.”

Joel Eidsath -- jeidsath@gmail.com

User avatar
ἑκηβόλος
Textkit Zealot
Posts: 969
Joined: Wed Aug 07, 2013 10:19 am
Contact:

Re: The texts from Perseus Library

Post by ἑκηβόλος »

jeidsath wrote:Email perseus_webmaster@tufts.edu to get things fixed. They've been working on a new viewer for a while now.
Thanks for that.

I got a prompt and polite reply from the managing editor. It said that the problem had been logged before and has been updated in the github repository, (but not in the current P4 browser). The most up to date version of this work is available at:
https://github.com/PerseusDL/canonical- ... s-grc1.xml

Viewing the text in the most recent xml file on github in unicode is a lot more intuitive, I think.
jeidsath wrote:You could also submit a PR here, though I don't know their policy on accepting them
The policy for accepting pull requests is here:
https://github.com/PerseusDL/canonical- ... l-requests
τί δὲ ἀγαθὸν τῇ πομφόλυγι συνεστώσῃ ἢ κακὸν διαλυθείσῃ;

Post Reply