View Full Version : Converting Project Gutenberg books to ePub


derangedhermit
07-31-2012, 01:07 AM
I know that for many books they have computer-generated ePubs. I find them unsatisfactory.

I would like people who put effort into convert Project Gutenberg books to high quailtiy ePub to describe their methods, tools, etc.

- What format do you start with? Text (with the PG-unique markup)? HTML? ePub?
- Do you have a standard CSS file that you use? Mind sharing it?
- What tools do you use? Notepad++? Sigil?
- How hard do you work on collecting, editing, enhancing images?
- Do you use images of an original to work from?

What are the problems you run into, how do you deal with them, and how long does it take you to convert a book into a decent quality ePub with clean modern markup - a book you are satisfied with?

StoryEnthusiast
07-31-2012, 04:34 AM
I'd like to know too. I have chosen and released many Gutenberg stories on my portal site and my next step is to turn them into ePub books. This will benefit my learning experience.

Jellby
07-31-2012, 05:28 AM
- What format do you start with? Text (with the PG-unique markup)? HTML? ePub?

If there is a hand-made HTML, I generally use that. Otherwise I choose the plain text version (in utf8 or latin1 encoding if available).

- Do you have a standard CSS file that you use? Mind sharing it?

Sort of. I copy a CSS from one of my most recent projects (or another book from the same "series") and then add or remove things are needed. The CSS files I use can be found in any of the books I've uploaded (e.g. The Adventures of Tom Sawyer).

- What tools do you use? Notepad++? Sigil?

vim, but I'm a nitpicker.

- How hard do you work on collecting, editing, enhancing images?

I try to find a good scan in The Internet Archive (http://www.archive.org), download the raw image files, rotate and crop the illustrations, remove the speckles and make sure the background is pure white (for black and white illustrations). Then I resize all illustrations by the same factor.

- Do you use images of an original to work from?

Sometimes. If I don't find a good scan online and I have an original I can scan.

AlexBell
07-31-2012, 05:45 AM
I know that for many books they have computer-generated ePubs. I find them unsatisfactory.

I would like people who put effort into convert Project Gutenberg books to high quailtiy ePub to describe their methods, tools, etc.

- What format do you start with? Text (with the PG-unique markup)? HTML? ePub?
- Do you have a standard CSS file that you use? Mind sharing it?
- What tools do you use? Notepad++? Sigil?
- How hard do you work on collecting, editing, enhancing images?
- Do you use images of an original to work from?

What are the problems you run into, how do you deal with them, and how long does it take you to convert a book into a decent quality ePub with clean modern markup - a book you are satisfied with?

For what it is worth most of the ePubs I have done for the MobileRead library were originally from Project Gutenberg HTML files. I could not agree more with your dislike of their ePub files.

MobileRead won't let me upload ebook.css; if you send me a private message with your email address I'll email it to you.

I use the Coffee Cup HTML editor, mainly because I've dabbled in web design in the past.

How does one measure 'How hard' one works with images? I certainly spend some time searching for images, and sometimes use the images in the PG ePub file. If the images are poor quality I spend some time trying to improve them, but I'm certainly not an expert. The images in the last ebook I did (The Story of Francis Cludde by Stanley J. Weyman) I think were too dark and had poor contrast, and I think I've improved them.

I have found original images and used them - cf Robinson Crusoe by Daniel Defoe. I certainly spend time searching for cover images. Or do you mean do I produce original images? Only if I have to, and can't find an image to use as a cover - but even then I can usually find an image to put text on for a cover - cf Civil Disobedience and Other Essays by Henry David Thoreau.

I hope this helps.

PS zipped ebook.css attached. Thank you Harry.

HarryT
07-31-2012, 11:58 AM
MobileRead won't let me upload ebook.css; if you send me a private message with your email address I'll email it to you.


ZIP it and upload the ZIP file.

HarryT
07-31-2012, 12:00 PM
I try to find a good scan in The Internet Archive (http://www.archive.org), download the raw image files, rotate and crop the illustrations, remove the speckles and make sure the background is pure white (for black and white illustrations). Then I resize all illustrations by the same factor.


Yes, "archive.org" is by far the best source for scanned books with images. That's where I get most of mine from, too. Its scans are also, of course, a good source of printed editions to proofread against. I certainly wouldn't trust a PG book without proofing it - especially an older one.

derangedhermit
08-02-2012, 12:33 AM
Thanks for the replies. I respect y'all's work. Your comments mirror my beginning attempts: the PG books require proofing, and that takes time. Doing a proper ePub markup takes time - cleaning out the junk, mainly. Images add a lot, and are worth including, but often involve a fair bit of work to get to "publication quality".

So far it's about 40 hours of work for me to take an "average" PG book, clean it up and proofread it against another text, insert markup, clean up and insert images, and check the whole thing over. It's tough for me to think of doing many books at that rate, although I would like to.

How long, on average, does a book take for you?

HarryT
08-02-2012, 02:39 AM
It depends on the length of the book, of course. I generally spend about 1-1.5 hours a day proofreading my books (I do it in bed at night :) ), during which time I'll typically get through perhaps 20-30 pages. So a 400 page book would take me maybe 15-20 days to proofread.

mrmikel
08-02-2012, 07:56 AM
At archive.org, there are often multiple copies of books. Their text quality is often similar, but the image quality varies considerably depending on whether it was scanned as black and white or grayscale. So if you don't like the first one you downloaded, check and see if there is another in grayscale or color.

derangedhermit
08-05-2012, 12:33 PM
Does anyone feed the proofread corrected copies back to PG? It seems like that would be a very good and low-effort thing to do.

HarryT
08-05-2012, 02:15 PM
Does anyone feed the proofread corrected copies back to PG? It seems like that would be a very good and low-effort thing to do.

I've offered them some of my stuff, but they don't seem interested. Perhaps for legal reasons, because my sources are often not American editions.

Jellby
08-06-2012, 05:19 AM
Does anyone feed the proofread corrected copies back to PG? It seems like that would be a very good and low-effort thing to do.

I submit the list of errors I find to PG. They sometimes apply the corrections, but it may take some time. They must check all corrections manually, ideally make sure they correspond to the same edition that was originally used, apply the changes to all the formats, etc. I guess they are short of manpower.

A much more effective of helping is Distributed Proofreaders, where the changes are made before the books are submitted to PG. There's a "smoothreading" stage for those that just want to read a book.