![]() |
#1 |
Enthusiast
![]() Posts: 35
Karma: 10
Join Date: Jun 2007
Location: United Kingdom
Device: iPad Mini, Nexus 7, Sony Reader, Kindle, and others.
|
ePubBooks.com: Gulliver's Travels...with images
I have just made available an EPUB version of Gulliver's Travel by Jonathan Swift over on the ePub Books Blog. I'm releasing this title as it contains lots of footnotes and images.
For the last few months I've been creating some conversion scripts to convert the Project Gutenberg TXT files into the epub format. I have now finished those and have made this title available for everyone to try out while I'm working on building the new website. http://www.epubbooks.com/blog/200812...ub-ebook-test/ I would love to hear your feedback, on everything from the frontend formatting to the underlying XML coding. |
![]() |
![]() |
![]() |
#2 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,199
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Since you're using utf-8 encoding anyway, I suggest replacing the numeric entities with utf-8 characters. Makes for smaller file sizes and easier parsing.
Are you planning to release your scripts to convert the gutenberg txt markup to HTML? I've found that gutenberg books tend to have a lot of variation in their markup. How well does your script handle that? |
![]() |
![]() |
Advert | |
|
![]() |
#3 | |||
Enthusiast
![]() Posts: 35
Karma: 10
Join Date: Jun 2007
Location: United Kingdom
Device: iPad Mini, Nexus 7, Sony Reader, Kindle, and others.
|
Quote:
Quote:
The answer to your question is yes....and no. I do hope to release this in the future but at the moment it is not really very user friendly/robust. I currently have 90 files converted (my test base) but once my site is live I will start churning out more titles, which means I can start to improve the scripts. Quote:
At the moment my footnote routines only do half their job...thankfully most files only have a few so it hasn't been such a big problem. Still, I will improve this soon. I catch most quotes but some are missing/not included in the source and I still confuse single quotes used for word contractions. e.g. 'nothin' could help 'im save the world' Shouldn't be too hard to fix most, if not all of these. The are currently two really big areas that need improvements to speed up conversion. Frontmatter: I process as much as I can but I still need to include the original frontmatter (between the TEI header and first chapter) so I can double check everything and add any missing info into the teiHeader and front sections. Images: I can automatically mark-up the TEI for images but the PG txt files don't actually have any filename information. For images I need to go through the HTML version and manually add these into the TEI. I should be able to add functionality into the script to read in the the files from disk and populate the TEI file, but this is still prone to errors. Basically, there is always going to be some manual work needed, but I hope to reduce this to a minimum pretty quickly. |
|||
![]() |
![]() |
![]() |
#4 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,199
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Since you parse the HTML versions for images anyway, why not use those as the source, since they are less inconsistent that the txt files, and only fallback to the txt when no html is present?
|
![]() |
![]() |
![]() |
#5 |
Enthusiast
![]() Posts: 35
Karma: 10
Join Date: Jun 2007
Location: United Kingdom
Device: iPad Mini, Nexus 7, Sony Reader, Kindle, and others.
|
Actually I parse the TXT version! ...does this make me crazy!
At the time it actually seemed like the HTML versions would present more problems than less. Yes chapters and paragraph were already done, but there's often a lot variations in other aspects. I don't believe this would have made things any easier. Plus there was potential for messy mark-up...I really wanted to keep things ultra clean. Whether that was the right decision or not, I won't change things now. ![]() |
![]() |
![]() |
Advert | |
|
![]() |
#6 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,199
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
If you're going for a manual conversion approach, then the TXT files make sense, since they will, as you say, yield cleaner epub files.
|
![]() |
![]() |
![]() |
#7 |
Enthusiast
![]() Posts: 35
Karma: 10
Join Date: Jun 2007
Location: United Kingdom
Device: iPad Mini, Nexus 7, Sony Reader, Kindle, and others.
|
For sure, although I'm hoping to reduce the 'manual' labour to a minimum. It's as much about producing clean TEI/epub files as it is converting the PG catalogue. It will take longer to build up the ePub book catalogue, but I think the results will be well worth it.
|
![]() |
![]() |
![]() |
#8 | |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,199
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Quote:
|
|
![]() |
![]() |
![]() |
#9 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,199
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Another suggestion: Add class="chapter_heading" to the chapter headings.
Also, if you are able to identify image captions, you should put that into the alt attribute of img tags instead of the generic Illustration and add class="image_caption" to the image captions in the text itself. |
![]() |
![]() |
![]() |
#10 | |
Enthusiast
![]() Posts: 35
Karma: 10
Join Date: Jun 2007
Location: United Kingdom
Device: iPad Mini, Nexus 7, Sony Reader, Kindle, and others.
|
Quote:
Some images in the PG files have both a caption and description so the TEI is marked up in this way. I can't now remember the reasoning for taking the alt attribute from the <figDesc> tag but perhaps this needs rethinking. |
|
![]() |
![]() |
![]() |
#11 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,199
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
To make the HTML more semantic so that if someone wants to further process/convert the epub files or if a user wants to use a custom CSS stylesheet to view them (calibre's epub viewer allows this), it will be easier.
|
![]() |
![]() |
![]() |
#12 |
Enthusiast
![]() Posts: 35
Karma: 10
Join Date: Jun 2007
Location: United Kingdom
Device: iPad Mini, Nexus 7, Sony Reader, Kindle, and others.
|
Okay thanks Kovid, I will certainly give that some thought.
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Fantasy Swift, Jonathan: Gulliver's Travels. v5 10 Nov 2013 | Jellby | ePub Books | 7 | 11-10-2013 05:07 AM |
Fantasy Swift, Jonathan: Gulliver's Travels. (Illustrated) V1. 24 May 2010 | nrapallo | IMP Books (offline) | 0 | 05-25-2010 01:08 AM |
Swift, Jonathan: Gulliver's Travels. v1, 3 Jan 2008 | Madam Broshkina | IMP Books | 0 | 01-03-2008 05:31 PM |
Swift, Jonathan: Gulliver's Travels. v1, 3 Jan 2008 | Madam Broshkina | Kindle Books | 0 | 01-03-2008 05:30 PM |
Swift, Jonathan: Gulliver's Travels. v1, 3 Jan 2008 | Madam Broshkina | BBeB/LRF Books | 0 | 01-03-2008 05:27 PM |