View Single Post
Old 12-03-2008, 03:34 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,235
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Since you're using utf-8 encoding anyway, I suggest replacing the numeric entities with utf-8 characters. Makes for smaller file sizes and easier parsing.

Are you planning to release your scripts to convert the gutenberg txt markup to HTML? I've found that gutenberg books tend to have a lot of variation in their markup. How well does your script handle that?
kovidgoyal is offline   Reply With Quote