Textkit Logo

The texts from Perseus Library

Textkit is a learning community- introduce yourself here. Use the Open Board to introduce yourself, chat about off-topic issues and get to know each other.

The texts from Perseus Library

Postby Gergian » Thu Jan 11, 2018 5:04 pm

Perseus offers almost all of his texts for download in ( http://www.perseus.tufts.edu/hopper/opensource/download ), but is in HTML format. Does someone here know a easy way (maybe a software, a site etc...) to convert the HTML format to a readable format ?
Gergian
Textkit Neophyte
 
Posts: 32
Joined: Wed Jan 03, 2018 6:50 pm

Re: The texts from Perseus Library

Postby bedwere » Thu Jan 11, 2018 11:09 pm

Maybe the easiest thing for you is to open an html file with your favorite browser and then copy and paste.
User avatar
bedwere
Global Moderator
 
Posts: 3222
Joined: Fri Mar 07, 2008 10:23 pm
Location: Didacopoli in California

Re: The texts from Perseus Library

Postby Gergian » Fri Jan 12, 2018 12:59 pm

But If I do this, they open the HTML code, not the text... Ok, let leave the "easy" way, but I need a HTML editor or something like that?
Gergian
Textkit Neophyte
 
Posts: 32
Joined: Wed Jan 03, 2018 6:50 pm

Re: The texts from Perseus Library

Postby bedwere » Sat Jan 13, 2018 5:19 am

I think they are XML, not HTML files. Maybe someone else knows more.
User avatar
bedwere
Global Moderator
 
Posts: 3222
Joined: Fri Mar 07, 2008 10:23 pm
Location: Didacopoli in California

Re: The texts from Perseus Library

Postby jeidsath » Sun Jan 14, 2018 3:49 am

Yes, they are XML. You need to use a programming language with an XML parser Python, Java, etc., to get them to spit out the text. And even then, you'll have to do a fair amount of custom coding.
Joel Eidsath -- jeidsath@gmail.com

μὴ δ’ οὕτως ἀγαθός περ ἐὼν θεοείκελ’ Ἀχιλλεῦ
κλέπτε νόῳ, ἐπεὶ οὐ παρελεύσεαι οὐδέ με πείσεις.
User avatar
jeidsath
Administrator
 
Posts: 2387
Joined: Mon Dec 30, 2013 2:42 pm
Location: Γαλεήπολις, Οὐισκόνσιν

Re: The texts from Perseus Library

Postby ἑκηβόλος » Wed May 23, 2018 7:16 am

On a related issue...

The text of the very first section if Daphnis and Chloe can not be seen as a single section on Perseus:
Longus 1.1.1-2 wrote:[p. 241] κάλλιστον ὧν εἶδον: εἰκόνα, γραφήν, ἱστορίαν ἔρωτος. Καλὸν μὲν καὶ τὸ ἄλσος, πολύδενδρον, ἀνθηρόν, κατάρρυτον: μία πηγὴ πάντα ἔτρεφε, καὶ τὰ ἄνθη καὶ τὰ δένδρα: ἀλλ̓ ἡ γραφὴ τερπνοτέρα καὶ τέχνην ἔχουσα περιττὴν καὶ τύχην ἐρωτικήν: ὥστε πολλοὶ καὶ τῶν ξένων κατὰ φήμην ᾔεσαν, τῶν μὲν Νυμφῶν ἱκέται, τῆς δὲ εἰκόνος θεαταί. [2] Γυναῖκες ἐπ̓ αὐτῆς τίκτουσαι καὶ ἄλλαι σπαργάνοις κοσμοῦσαι: παιδία ἐκκείμενα, ποίμνια τρέφοντα: ποιμένες ἀναιρούμενοι, νέοι συντιθέμενοι: λῃστῶν καταδρομ


The reason that it can't be seen is that there is a line missing in the xml code, viz.
Code: Select all
<milestone unit="section" n="1"/>


It ought to be after the <p> in
Code: Select all
<head>*p*r*o*o*i*m*i*o*n</head>
<p>
<pb id="p.241"/>


Is the Perseus xml user editable? Can I actually do something about that?
The child is the father of the man.
(W.W., 1802)
User avatar
ἑκηβόλος
Textkit Enthusiast
 
Posts: 503
Joined: Wed Aug 07, 2013 10:19 am
Location: Nanchang, PRC

Re: The texts from Perseus Library

Postby jeidsath » Wed May 23, 2018 8:59 pm

Email perseus_webmaster@tufts.edu to get things fixed. They've been working on a new viewer for a while now.

You could also submit a PR here, though I don't know their policy on accepting them: https://github.com/PerseusDL
Joel Eidsath -- jeidsath@gmail.com

μὴ δ’ οὕτως ἀγαθός περ ἐὼν θεοείκελ’ Ἀχιλλεῦ
κλέπτε νόῳ, ἐπεὶ οὐ παρελεύσεαι οὐδέ με πείσεις.
User avatar
jeidsath
Administrator
 
Posts: 2387
Joined: Mon Dec 30, 2013 2:42 pm
Location: Γαλεήπολις, Οὐισκόνσιν

Re: The texts from Perseus Library

Postby ἑκηβόλος » Fri May 25, 2018 12:14 am

jeidsath wrote:Email perseus_webmaster@tufts.edu to get things fixed. They've been working on a new viewer for a while now.
https://github.com/PerseusDL

Thanks for that.

I got a prompt and polite reply from the managing editor. It said that the problem had been logged before and has been updated in the github repository, (but not in the current P4 browser). The most up to date version of this work is available at:
https://github.com/PerseusDL/canonical- ... s-grc1.xml

Viewing the text in the most recent xml file on github in unicode is a lot more intuitive, I think.

jeidsath wrote:You could also submit a PR here, though I don't know their policy on accepting them
The policy for accepting pull requests is here:
https://github.com/PerseusDL/canonical- ... l-requests
The child is the father of the man.
(W.W., 1802)
User avatar
ἑκηβόλος
Textkit Enthusiast
 
Posts: 503
Joined: Wed Aug 07, 2013 10:19 am
Location: Nanchang, PRC


Return to Open Board

Who is online

Users browsing this forum: No registered users and 44 guests