Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > ePub

Notices

Reply
 
Thread Tools Search this Thread
Old 06-29-2012, 06:41 PM   #1
cptnemo
Enthusiast
cptnemo began at the beginning.
 
Posts: 35
Karma: 10
Join Date: Oct 2011
Device: Kindle 3
Help on project to create ePub of the online Stanford Encyclopedia of Philosophy

Hello,

I have since long been a big user of the Stanford Encyclopedia of Philosophy, an open collection of about 1300 articles on philosophers and philosophical topics.

What I would like to create is an ePub version of the website so to bring the Encyclopedia always in my pocket.

What I have done so far is to download the articles with wget (about 270 mb) with these options: I excluded js, txt and css files, I got all the images necessary to display the pages, I did not create directories but importantly I asked wget to put all files in one folder (with no subdirectories) and to correct all internal links accordingly.

Basically, I created a functioning mono-folder website on my desktop. I though the next step would have been easy: I was wrong.

I tried:

1) to handcraft an ePub adding the required XML files and folder structure (mimetype, OEBPS, META-INF, etc.) following these instructions (I add over 4000 elements to Content.opf), and then to create my .zip/epub file with this nice applescript. But the ePub does not work: I unsuccessfully tried to open it with Sigil (it crashed) and with Calibre (it opens but internal links of the ePub do not work, they point to files on my desktop not in the ePub).

2) I tried to convert my handcrafted ePub with Calibre into another ePub. Calibre crashed.

3) I tried to create an ePub with Calibre opening the contents.html file from desktop. Calibre crashed.

Do you think my file is to big? Should I go through some preprocess with my html files before converting?
cptnemo is offline   Reply With Quote
Old 06-30-2012, 04:02 AM   #2
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 7,516
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
Quote:
Originally Posted by cptnemo View Post
and with Calibre (it opens but internal links of the ePub do not work, they point to files on my desktop not in the ePub).
Make sure the internal links in your files are relative links, not absolute.

As for other problems, my advice is to start with a small portion of the pages, so it is easier to test and debug.
Jellby is offline   Reply With Quote
Advert
Old 06-30-2012, 05:44 AM   #3
mmat1
Berti
mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.
 
mmat1's Avatar
 
Posts: 1,196
Karma: 4985964
Join Date: Jan 2012
Location: Zischebattem
Device: Acer Lumiread
Quote:
Originally Posted by cptnemo View Post
... (about 270 mb) ...
...(I add over 4000 elements to Content.opf)...
... maybe a bit too large
mmat1 is offline   Reply With Quote
Old 06-30-2012, 06:50 AM   #4
mrmikel
Color me gone
mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.
 
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
Even if you get it to work, VERY SLOWLY, the Table of Contents may be larger than most entire books, especially when displayed on an small screen.

Certainly there are topic areas which lend themselves to creating books by themselves. At least in the Sony Readers you can create Collections, this sort of thing may be helpful.

Make sure the links don't point outside of the device, as some devices are not connected to the internet all the time or accept references to the web.

The Harvard Classics, which your project sort of reminds me of, takes 50 paper books and the ereader files keep the same number of books.

Links are case-sensitive in epubs, so you will have to watch for that too.

You might try just copying and pasting articles one at a time into Sigil, making sure to insert chapter breaks so that some readers will not fail because the sections are over about 250k.

You could use something similar to HTTrack to get the whole web site on your computer, so you have access to all the bits and pieces as you construct your book.
mrmikel is offline   Reply With Quote
Old 06-30-2012, 08:18 AM   #5
AlPe
Digital Amanuensis
AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.
 
AlPe's Avatar
 
Posts: 727
Karma: 1446357
Join Date: Dec 2011
Location: Turin, Italy
Device: Several eReaders and tablets
Hi cptnemo,

the previous observations are correct: besides technical things, like internal links and the like (have you tried validating your EPUB, after creating it?), the file/TOC/OPF-content size might be a problem for most readers. I second the previous suggestion of trying splitting the files in smaller EPUBs, e.g., by starting letter.

I did a similar, but smaller in size, project by fetching the Divina Commedia with several commentaries from the Dartmouth Dante Project (you can get a proof-of-concept here: here), and packing it into a single EPUB (the resulting EPUB is roughly 3 MB, with 100 XHTML pages, and 10,000 notes/comments).

I am quite busy these days, but mid next-week I could give a look at it, if you like to work on this 'project'.
AlPe is offline   Reply With Quote
Advert
Old 08-25-2013, 03:50 PM   #6
Zeno_
Member
Zeno_ began at the beginning.
 
Zeno_'s Avatar
 
Posts: 21
Karma: 10
Join Date: Jun 2012
Device: Sony PRS-T1
I got the same idea as the TS.
Because I just convert the HTML files to ePub, the file won't be that big.

So it can be done, but there are some problems.

As of now:
- Some logic symbols aren't rendered correctly.
- There is a lot of gibberish before the article starts. I would be able to remove it if I find a way to add extra CSS to every HTML file. This could be a setting from a mass downloader, appending it to every HTML file it downloads, or maybe someone knows a way to add something to a lot of HTML files.
- The TOC is HUGE.
- The title of every subject is repeated twice, one webpage title (with Stanford etc. etc. added) and the title of the topic.
- Internal links don't work, only an index file does (can be solved manually, I can convert words that contain an exact title of another topic automatically).

Solvable problems, not even big problems, but if someone can help me with the second one in particular, that would be great.

Last edited by Zeno_; 08-25-2013 at 03:54 PM.
Zeno_ is offline   Reply With Quote
Old 08-25-2013, 03:58 PM   #7
AlPe
Digital Amanuensis
AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.
 
AlPe's Avatar
 
Posts: 727
Karma: 1446357
Join Date: Dec 2011
Location: Turin, Italy
Device: Several eReaders and tablets
if you look at the source code of one article page, you will notice that the useful stuff is contained within two comments:

Code:
<div id="aueditable"><!--DO NOT MODIFY THIS LINE AND ABOVE-->
<h1>Abduction</h1><div id="pubinfo"><em>First published Wed Mar 9, 2011</em></div>
...
</div><!-- #aueditable --><!--DO NOT MODIFY THIS LINE AND BELOW-->
MY suggestion is to download every article, then grab the part of the source within those comments, and create your own header/footer, and a CSS where you will use the classes defined in the original CSS of the SEoP articles.
AlPe is offline   Reply With Quote
Old 08-26-2013, 03:05 PM   #8
AlPe
Digital Amanuensis
AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.
 
AlPe's Avatar
 
Posts: 727
Karma: 1446357
Join Date: Dec 2011
Location: Turin, Italy
Device: Several eReaders and tablets
A quick Bash script told me that currently there are 1385 articles linked from the online TOC ( http://plato.stanford.edu/contents.html ), and that the uncompressed size of all the HTML pages is ~127 MB.

If you are going to create a single EPUB, I doubt eReaders will open it, due to its size. Tablet apps might be ok.

In case, do not put every entry in the TOC. Just put "Letter A", "Letter B", etc. linking to an XHTML page that contains links to the entries. Probably a 2 level TOC ("A" > "AB", "AC", ... | "B" > "BA", "BE", etc.) would still be manageble by most reading systems.

EDIT: from a quick test, the EPUB containing all the articles is around ~40MB. My iPad opens it, but it is very slow at opening it from the library, and to navigate the TOC. My Kobo Glo is simply not able to even open it (or, more precisely, seems still trying, after ~5 minutes)

EDIT: with a bit of labor, one can remove the junk and get a decent EPUB, still slow to load from library, but at least the navigation is not untolerably slow. See attached screenshots.
Attached Thumbnails
Click image for larger version

Name:	seop1.png
Views:	318
Size:	268.3 KB
ID:	110047   Click image for larger version

Name:	seop2.png
Views:	319
Size:	269.5 KB
ID:	110048  

Last edited by AlPe; 08-26-2013 at 04:42 PM.
AlPe is offline   Reply With Quote
Old 08-31-2013, 01:10 PM   #9
Zeno_
Member
Zeno_ began at the beginning.
 
Zeno_'s Avatar
 
Posts: 21
Karma: 10
Join Date: Jun 2012
Device: Sony PRS-T1
I underestimated how many entries there were
Now I understand the problem with the file size.
Kinda defeats the purpose of trying for me if I can't open it with my PRS-T1.

For iPad, I'd reccomend an app like Offline pages or something similar.
Then you can just download a page every now and then if you need it offline.
For iPhone there is also an app from Stanford, but it requires an internet connection to browse.

Last edited by Zeno_; 08-31-2013 at 01:13 PM.
Zeno_ is offline   Reply With Quote
Old 09-01-2013, 05:47 AM   #10
AlPe
Digital Amanuensis
AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.
 
AlPe's Avatar
 
Posts: 727
Karma: 1446357
Join Date: Dec 2011
Location: Turin, Italy
Device: Several eReaders and tablets
Quote:
Originally Posted by Zeno_ View Post
I underestimated how many entries there were
Now I understand the problem with the file size.
Kinda defeats the purpose of trying for me if I can't open it with my PRS-T1.
But you can either:

1) select only the articles you want to read and bundle them into a single EPUB file (like Wikipedia allows) or

2) split the entire encyclopedia in multiple EPUB files.

I do not have my T1 with me, but my Kobo Glo has just accepted an EPUB of size 5 MB, with a subset of the articles. <10 such chuncks should be enough for the whole SEoP. Clearly you loose the chance of clicking internal links if you are on an A-article and you want to jump to a Z-article, but you gain the possibility of reading it on e-ink, which is still very nice.

Actually, I was thinking of writing an email to the SEoP guys, asking whether they could provide an "official" EPUB file (or multiple chunks). I always find odd that universities/foundations put data online, without protection from screenscraping, but they do not provide access to the raw data nor produce reasonable output formats --- unlike advertise-based sites, they do not gain anything from having people visiting their Web pages.
AlPe is offline   Reply With Quote
Old 09-01-2013, 06:47 AM   #11
mrmikel
Color me gone
mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.
 
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
Since it is an academic document, it is probably littered with footnotes, etc. Since it is for private use, why not just dump them all overboard? It might reduce the book to a reasonable size. You might get rid of indexes, etc also, if the articles are all old friends.

You can always look up things you have questions about when you get to a computer.
mrmikel is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
help our project, pdfs->ePub golla ePub 4 09-16-2011 12:04 AM
Error when trying to save new ePub project rstevenson Sigil 19 06-26-2011 12:25 AM
Free young adult online read from HarperCollins - The Amanda Project Susan Crealock Deals and Resources (No Self-Promotion or Affiliate Links) 0 05-27-2011 09:51 AM
Stanford Encyclopedia of Philosophy FlorenceArt Deals and Resources (No Self-Promotion or Affiliate Links) 6 08-29-2009 07:43 PM
Create boollet online ziobleed Workshop 0 09-02-2008 06:46 PM


All times are GMT -4. The time now is 02:39 AM.


MobileRead.com is a privately owned, operated and funded community.