Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Readers > Amazon Kindle

Notices

Reply
 
Thread Tools Search this Thread
Old 04-26-2008, 03:32 AM   #1
cerement
Groupie
cerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it is
 
cerement's Avatar
 
Posts: 170
Karma: 2000
Join Date: Apr 2008
Location: San José, CA
Device: Amazon Kindle 1, Sony PRS-300, Amazon Kindle 3
Gutenberg to Kindle, the Long Way

My apologies if this starts to ramble. I'll provide background in this post and then details in my next post. This project started when I figured I wanted to learn how to create an ebook for my Kindle from a Project Gutenberg text after being dissatisfied with the quality of the version provided on Mobipocket's free books.

Invaluable resources include:
Project Gutenberg (of course)
Mobipocket eBook guide by HarryT
Quick Tip: Margins by Joshua Tallent
Google Books

Unfortunately, Amazon's own DTP site was less than informative beyond one user's tip that the best results were using a .mobi file as your master file to send to DTP Amazon ...
cerement is offline   Reply With Quote
Old 04-26-2008, 04:34 AM   #2
cerement
Groupie
cerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it is
 
cerement's Avatar
 
Posts: 170
Karma: 2000
Join Date: Apr 2008
Location: San José, CA
Device: Amazon Kindle 1, Sony PRS-300, Amazon Kindle 3
-- First, grab the text file for your book of choice from Project Gutenberg
If an HTML version is available (and you can live with the formatting), grab that and save yourself some hassle

-- Open the file up in the text editor of your choice
After trying out several, I've become comfortable with Emeditor Free, but it really is just personal choice. Just make sure that it supports regular expressions. This will save you no end of hassle (I'm also going to use a lot of regex notation further down)

-- Plan for a nice long session of searching and replacing
Before you begin, get to know your text (and more importantly, the idiosyncrasies of the transcriber who prepped it for PG - did they use underlines or asterisks or something else for italics, did they create a funky system to indicate accented characters?) Is the book verse or prose (or worse, a mix)? If you're lucky, Google Books has the scanned copy so you can get a visual idea of how the book looked.
  1. Decide what you want to do with the Gutenberg boilerplate (if you can figure out how to work this thing into your layout, then you're a better man than I am)
  2. Remove double spaces, spaces at the beginning of lines, spaces at the end of lines. (Double spaces were a leftover artifact of typewriters.)
  3. Remove excess blank lines, convert everything into paragraphs (markup will come later). Simplest way is 3 steps: search and replace double returns \n\n with something uncommon like @@@@@, search and replace single returns \n with a space, then search and replace your marker @@@@@ with double returns \n\n
  4. Escape out three special characters: & to &amp; < to &lt; > to &gt; (replace the ampersand first!)
  5. Encode emdashes &mdash; endashes &ndash; and ellipses &hellip; ... watch out for special construct in older texts that try to refer to someone anonymously by the first letter of their name: H---- (initial plus two emdashes))
  6. Encode any other special characters and accented characters (search for [\xA0-\xFF] initially to find them) (and while you're at it, make sure that there are no characters in the ranges [\x00-\x1F] and [\x7F-\x9F] (these are invalid ranges in Latin-1 character set))
  7. Now the hardest part: converting all the " and ' marks to curly quotes! There's a LOT of special cases that have to be caught before the primary conversion. A few of the special cases include measurements 5'10", abbreviated years '78, "'nested' quotes", and 'British' vs. "American" quotes (it it's British quotes, some of the steps below will have to be reversed)
    1. Number followed by quote \d', \d" (figure out someway to mask it for later)
    2. Opening nested quotes "' &ldquo;&lsquo;
    3. Whitespace single quote number \s'\d to &rsquo;
    4. Whitespace single quote \s' to &lsquo;
    5. Line beginning single quote ^' to &lsquo;
    6. Leftover single quotes to &rsquo;
    7. Whitespace double quote \s" to &ldquo;
    8. Line beginning double quote to &ldquo;
    9. Leftover double quotes to &rdquo;
  8. Search for double returns and add in paragraph marks <p>
  9. Add in the HTML header and footer
  10. Markup your italics <em>, bold <strong>, supercript <sup>, and subscript <sub>
  11. Go back and look for items with special line breaks, indents, and blockquotes - if you don't want a paragraph to indent, add class="noind" to your <p> mark and addin the line p.noind {text-indent:0} to your style section - space above a line can be adjusted by adding in a height tag, ex. <h4 height="2em">
  12. Mark up headings and chapter heads <h1> to <h6>
  13. Link the table of contents to chapters, mark the table of contents with <a name="toc">
  14. Mark your starting page with <a name="start">
  15. Prep and link in any images (restrict image sizes to 600 pixels wide by 800 pixels tall)
  16. If chapters are decent length, they can be separated by pagebreaks using <mbpagebreak /> tag (that's mbp <colon> pagebreak ... stupid smiley)

And at this point, you should be ready to head over to HarryT's tutorial with a HTML file all ready to be converted into a Mobipocket file ready for your Kindle.

Last edited by cerement; 04-26-2008 at 04:38 AM.
cerement is offline   Reply With Quote
Advert
Old 04-26-2008, 04:55 AM   #3
cerement
Groupie
cerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it iscerement knows what time it is
 
cerement's Avatar
 
Posts: 170
Karma: 2000
Join Date: Apr 2008
Location: San José, CA
Device: Amazon Kindle 1, Sony PRS-300, Amazon Kindle 3
(Last post, I promise)

I would desperately like to get my hands on the html file that was used to generate the user guide that came on our Kindles just so I can see what Amazon approved markup in the ebooks looks like. Failing that, I'm providing these files in the hopes that someone else doesn't have to go through the same hassle.

All of the files are based off of the original text file for the Project Gutenberg text 17972: Round About the Carpathians by Andrew F. Crosse

The final HTML files (and images) are attached below. The HTML file shows not only all the character encoding, but also all the specialty markup for Mobipocket ebooks (and for Kindle ebooks specifically)

I've posted the final results of all the above work in the Mobi/PRC Books forum

For the inevitable "contrast and compare", the free Mobipocket book that started this journey

As well as the Google Books scan showing what the book really looked like (in black and white)
Attached Files
File Type: zip 17972-8-html.zip (535.5 KB, 413 views)
cerement is offline   Reply With Quote
Old 04-26-2008, 12:19 PM   #4
cfw123
Enthusiast
cfw123 began at the beginning.
 
Posts: 41
Karma: 10
Join Date: Dec 2007
Location: San Jose, Calif., USA
Device: Kindle
Thanks for this helpful file -- don't worry about running on -- content is the key here. I've had my Kindle since Dec. 4th, and with my 8 GB memory card, I already have over 500 books stored, most from free sources. But I want to be able to edit them, have better names and authors, and hopefully categorize them which Kindle doesn't support (yet). I've discovered that we badly need format conversion programs allowing much default definitions to make things the way "you" want them. And Gutenbert is a great source if we can make them look better on a Kindle -- and you've helped a lot. But we really need much much more available. Amazon just isn't about to help either due to their hang-up on DRM which was obviously forced on them by publishers, since they fight DRM for music.

Charles Wilkes, San Jose, Calif.
cfw123 is offline   Reply With Quote
Old 04-26-2008, 12:32 PM   #5
cfw123
Enthusiast
cfw123 began at the beginning.
 
Posts: 41
Karma: 10
Join Date: Dec 2007
Location: San Jose, Calif., USA
Device: Kindle
There's an entirely separate need which covers all ebook readers: The need to be able to scan printed material, and have it automatically digitized and converted into a meta-format ebook format, which could then be specifically converted to fit a reader such as the Kindle, or many others. It should adapt to print fonts, perhaps with an intial learning stage. It should adapt to single or muti-column formats. It should text flow all content into paragraphs with no fixed line lengths to support text flow on a reader. It should allow one to scan both pages of an open book on a scanner with a large enough size to read both pages at the same time, else read a page at a time, which makes more work for the person doing the scanning, and thus should be avoided if possible. It should digitize illustrations into j-pegs if possible, retaining color in the meta-format version, but supporting b&w only in the 2nd conversion step to adapt to a specific reader (which like the Kindle does not presenty support color). Someday we will also have color I'm sure, once e-paper color arrives, with better and better down the time line into the far future (for current generation users anyway). But until then, there are much better ways to adapt color j-pegs to multi-level grey scale b&w illustrations than those we normally see on our Kindle and other similar readers.
Charles Wilkes, San Jose, Calif.
cfw123 is offline   Reply With Quote
Advert
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
What to do with your old Kindle, long term? carld Amazon Kindle 4 09-05-2010 07:34 PM
Kindle v.2 Problems with MOBI TOC from Gutenberg.org Sonist Amazon Kindle 2 06-15-2009 04:34 PM
New Kindle Edition: 88 cents--but not for long! sigrosenblum Writers' Corner 0 06-09-2009 10:10 PM
Project Gutenberg on Kindle 1? Astabeth Amazon Kindle 25 04-04-2009 05:24 PM
gutenberg.com is not gutenberg.org ProDigit News 2 11-21-2008 12:39 PM


All times are GMT -4. The time now is 10:19 PM.


MobileRead.com is a privately owned, operated and funded community.