View Single Post
Old 04-14-2011, 07:38 PM   #1
Wintermute
Junior Member
Wintermute began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Apr 2011
Device: Kindle 3
Converting pandoc generated HTML to ePUB with Calibre

Hi,

I recently formatted a (very long) book with pandoc. I find pandoc nice and really convenient for generating HTML. Then I convert the HTML into ePUB using calibre. Pandoc itself can generate ePUB, but the process is very automatic, and not many things can be customized. For instance, TOC levels cannot be customized (only chapters appear in TOC). That's the reason I use calibre to convert from HTML to ePUB: using the command line I can nicely control a lot of stuff about the generated ePUB (cover, TOC, etc...).

The problem is that the book I'm converting is in Spanish, and some of the chapter's titles contains accents (á é í ó ú). For every section element (h1, h2, etc...) pandoc generates and id you can use to refer to that element in the text. For example, if a chapter is entitled "Introducción", pandoc generates this into the HTML.

Code:
<h1 id="introducción">Introducción</h1>
Calibre crashes if some hX header contains non-ascii characters.

Here's calibre's output.

Code:
Converting ebook with calibre
1% Converting input to HTML...
InputFormatPlugin: HTML Input running
on /home/Literature/Calibre-tests y pruebas/test-pandoc/pandoc-example.html
Language not specified
Building file list...
Normalizing filename cases
Rewriting HTML links
34% Running transforms on ebook...
Merging user specified metadata...
Detecting structure...
        Detected chapter: My Book
        Detected chapter: Chapter One
        Detected chapter: Chapter Two
Auto generated TOC with 12 entries.
Flattening CSS and remapping font sizes...
Source base font size is 12.00000pt
Cleaning up manifest...
Trimming unused files from manifest...
Creating EPUB Output...
67% Creating EPUB Output
Traceback (most recent call last):
  File "/usr/bin/ebook-convert", line 19, in <module>
    sys.exit(main())
  File "/usr/lib/calibre/calibre/ebooks/conversion/cli.py", line 279, in main
    plumber.run()                                                                                                                                                                
  File "/usr/lib/calibre/calibre/ebooks/conversion/plumber.py", line 1018, in run                                                                                                
    self.opts, self.log)                                                                                                                                                         
  File "/usr/lib/calibre/calibre/ebooks/epub/output.py", line 169, in convert                                                                                                    
    split(self.oeb, self.opts)
  File "/usr/lib/calibre/calibre/ebooks/oeb/transforms/split.py", line 57, in __call__
    self.split_item(item)
  File "/usr/lib/calibre/calibre/ebooks/oeb/transforms/split.py", line 64, in split_item
    page_breaks, page_break_ids = self.find_page_breaks(item)
  File "/usr/lib/calibre/calibre/ebooks/oeb/transforms/split.py", line 123, in find_page_breaks
    page_breaks_.append((XPath('//*[@id=%r]'%id),
  File "xpath.pxi", line 446, in lxml.etree.XPath.__init__ (src/lxml/lxml.etree.c:115005)
  File "xpath.pxi", line 214, in lxml.etree._XPathEvaluatorBase._raise_parse_error (src/lxml/lxml.etree.c:112698)
lxml.etree.XPathSyntaxError: Invalid predicate
Is this a normal calibre behaviour or it's a pandoc's bug?

Thanks in advance for your help.
Wintermute is offline   Reply With Quote