Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Readers > Sony Reader > Sony Reader Dev Corner

Notices

Reply
 
Thread Tools Search this Thread
Old 09-09-2007, 07:40 PM   #1
phrodod
Enthusiast
phrodod began at the beginning.
 
phrodod's Avatar
 
Posts: 43
Karma: 28
Join Date: Aug 2007
Device: Sony Reader PRS-500
New conversion method: txt->rst->html->lrf

Hi all;

I've just gone through my first e-book creation experiment, and was looking for an easy way to convert the PG txt files to reader format. Restructured Text (rst) is a simple format designed to be both readable in text form and able to be processed into other formats automatically. It's the format used by Python's DocUtils package. One program included with that package, rst2html, can be used to convert lightly modified PG text files into HTML. I tried it out with Anna Karenina (by Tolstoy). Any feedback on the process is welcome, but I am happy with the result so far. (Of course, I'm only about 50 pages in on the reader...). If you'd like to view the results, check the reader downloads page.

I discovered that the process is actually pretty easy, but with a book as large as this one is, the Table of Contents (TOC) is difficult to navigate (many pages). So I went for a compromise. I split the original text file into separate files for each part, and had rst2html automatically generate a TOC for the part.

I then created a page of links to the other pages, ran the whole collection through rst2html to generate html pages, then used html2lrf to convert that to an e-book. I believe the results are quite nice.

The keys for this are: comfort with a good text editor (I use emacs), full python install (I installed Cygwin on my PC, and use the Python that came with it), docutils (search Google for the installer, then add it to your Python distribution, and comfort using the command line. I do all my conversion work here.

I'll post detailed instructions after I've done a couple of additional books.

Phrodod
phrodod is offline   Reply With Quote
Old 09-09-2007, 08:18 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Umm txt2lrf already supports a lightweight txt markup language, namely, markdown.
kovidgoyal is offline   Reply With Quote
Advert
Old 09-10-2007, 12:44 PM   #3
phrodod
Enthusiast
phrodod began at the beginning.
 
phrodod's Avatar
 
Posts: 43
Karma: 28
Join Date: Aug 2007
Device: Sony Reader PRS-500
Thanks. I didn't know that. I'll go read up on it!
phrodod is offline   Reply With Quote
Old 09-12-2007, 02:53 PM   #4
phrodod
Enthusiast
phrodod began at the beginning.
 
phrodod's Avatar
 
Posts: 43
Karma: 28
Join Date: Aug 2007
Device: Sony Reader PRS-500
Quote:
Originally Posted by kovidgoyal View Post
Umm txt2lrf already supports a lightweight txt markup language, namely, markdown.
After spending some time playing with this last night, I found either a) I need better documentation, or b) Markdown is much less capable than reStructuredText. I also found it to be much slower to convert Markdown to HTML than rst2html. For Anna Karenina, it was a matter of several minutes difference. Given my familiarity with rst and the ease with which I am able to use it, I'll probably continue unless I find a compelling case to switch.

On a separate note, does HTML2LRF have a way to make nested Reader TOC's? I notice that Sony's operations guide has that, and I'd find it much simpler to navigate using the number buttons and the page buttons than the joystick for getting to individual chapters. But if they all go onto a single, huge TOC page, I'm looking at something on the order of 200 chapters (or 20 pages of TOC entries). I'd love to make that simpler to navigate by putting each part on its own page. Part One has 34 chapters already, so the sub-TOC for Part 1 would STILL take 4 pages!

Thanks.

Phrodod
phrodod is offline   Reply With Quote
Old 09-12-2007, 03:17 PM   #5
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Considering that you can embed HTML in markdown, I find it hard to believe it's less capable. Some examples?

The operations guide is a PDF. As far as I know the LRF format doesn't have support for defining a hierarchical TOC.
kovidgoyal is offline   Reply With Quote
Advert
Old 09-12-2007, 09:10 PM   #6
phrodod
Enthusiast
phrodod began at the beginning.
 
phrodod's Avatar
 
Posts: 43
Karma: 28
Join Date: Aug 2007
Device: Sony Reader PRS-500
Quote:
Originally Posted by kovidgoyal View Post
Considering that you can embed HTML in markdown, I find it hard to believe it's less capable. Some examples?
Technically, you're correct. However, I find that the shortcuts available in rst (create a document title & subtitle similarly to the way you'd use Markdown and auto-generation of multi-level TOC to name two) allow me to accomplish the same task more quickly than I could by adding HTML to the document.

I appreciate having Markdown support, and I may use it in the future for simple documents (where a single-level TOC is sufficient!), but in this case, I found that it felt insufficient to me.

One other item I discovered. If I have multiple H2s in an HTML document that all have identical text, html2lrf only adds the first one to the TOC. So when I first attempted to convert Anna Karenina, I ended up with Part 1, Chapter 1, ..., Chapter 34, Part 2, Chapter 35, Part 3, Part 4, ...

Part 2 has 35 chapters, but 1-34 are named identically to Part 1's 34 chapters, so they never showed up in the Reader's TOC menu. OTOH, they showed up fine in the in-line TOC in the book.

Thanks for all your hard work!

Phrodod
phrodod is offline   Reply With Quote
Old 09-12-2007, 09:32 PM   #7
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Yeah that's a bug. open a report and i'll fix it as soon as i get some time.
kovidgoyal is offline   Reply With Quote
Old 09-13-2007, 02:50 AM   #8
phrodod
Enthusiast
phrodod began at the beginning.
 
phrodod's Avatar
 
Posts: 43
Karma: 28
Join Date: Aug 2007
Device: Sony Reader PRS-500
Quote:
Originally Posted by kovidgoyal View Post
Yeah that's a bug. open a report and i'll fix it as soon as i get some time.
It's ticket #199. Thanks!!
phrodod is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
TXT conversion to ePub or LRF - paragraph formatting Zapped Calibre 6 10-23-2009 05:06 PM
HTML to TXT conversion alkr Calibre 3 10-02-2009 09:54 AM
[Old Thread] unable to convert ebooks(rtf, txt,lit,html,pdf) to lrf in calibre .4.131 jackdeth191 Calibre 9 05-02-2009 02:55 AM
TXT, RTF, and HTML conversion issues daesdaemar Calibre 15 12-10-2008 09:05 PM
Batch conversion html to lrf lilpretender Sony Reader 5 04-22-2008 09:22 PM


All times are GMT -4. The time now is 09:27 AM.


MobileRead.com is a privately owned, operated and funded community.