The texts from Perseus Library

Perseus offers almost all of his texts for download in ( http://www.perseus.tufts.edu/hopper/opensource/download ), but is in HTML format. Does someone here know a easy way (maybe a software, a site etc…) to convert the HTML format to a readable format ?

Maybe the easiest thing for you is to open an html file with your favorite browser and then copy and paste.

But If I do this, they open the HTML code, not the text… Ok, let leave the “easy” way, but I need a HTML editor or something like that?

I think they are XML, not HTML files. Maybe someone else knows more.

Yes, they are XML. You need to use a programming language with an XML parser Python, Java, etc., to get them to spit out the text. And even then, you’ll have to do a fair amount of custom coding.

On a related issue…

The text of the very first section if Daphnis and Chloe can not be seen as a single section on Perseus:

The reason that it can’t be seen is that there is a line missing in the xml code, viz.

<milestone unit="section" n="1"/>

It ought to be after the

in

<head>*p*r*o*o*i*m*i*o*n</head>
<p>
<pb id="p.241"/>

Is the Perseus xml user editable? Can I actually do something about that?

Email perseus_webmaster@tufts.edu to get things fixed. They’ve been working on a new viewer for a while now.

You could also submit a PR here, though I don’t know their policy on accepting them: https://github.com/PerseusDL



https://github.com/PerseusDL

Thanks for that.

I got a prompt and polite reply from the managing editor. It said that the problem had been logged before and has been updated in the github repository, (but not in the current P4 browser). The most up to date version of this work is available at:
https://github.com/PerseusDL/canonical-greekLit/blob/master/data/tlg0561/tlg001/tlg0561.tlg001.perseus-grc1.xml

Viewing the text in the most recent xml file on github in unicode is a lot more intuitive, I think.

The policy for accepting pull requests is here:
https://github.com/PerseusDL/canonical-greekLit/wiki/Submit-a-correction-using-GitHub-pull-requests