I recently formatted a (very long) book with pandoc. I find pandoc nice and really convenient for generating HTML. Then I convert the HTML into ePUB using calibre. Pandoc itself can generate ePUB, but the process is very automatic, and not many things can be customized. For instance, TOC levels cannot be customized (only chapters appear in TOC). That's the reason I use calibre to convert from HTML to ePUB: using the command line I can nicely control a lot of stuff about the generated ePUB (cover, TOC, etc...).
The problem is that the book I'm converting is in Spanish, and some of the chapter's titles contains accents (á é í ó ú). For every section element (h1, h2, etc...) pandoc generates and id you can use to refer to that element in the text. For example, if a chapter is entitled "Introducción", pandoc generates this into the HTML.
Calibre crashes if some hX header contains non-ascii characters.
Here's calibre's output.
Converting ebook with calibre
1% Converting input to HTML...
InputFormatPlugin: HTML Input running
on /home/Literature/Calibre-tests y pruebas/test-pandoc/pandoc-example.html
Language not specified
Building file list...
Normalizing filename cases
Rewriting HTML links
34% Running transforms on ebook...
Merging user specified metadata...
Detected chapter: My Book
Detected chapter: Chapter One
Detected chapter: Chapter Two
Auto generated TOC with 12 entries.
Flattening CSS and remapping font sizes...
Source base font size is 12.00000pt
Cleaning up manifest...
Trimming unused files from manifest...
Creating EPUB Output...
67% Creating EPUB Output
Traceback (most recent call last):
File "/usr/bin/ebook-convert", line 19, in <module>
File "/usr/lib/calibre/calibre/ebooks/conversion/cli.py", line 279, in main
File "/usr/lib/calibre/calibre/ebooks/conversion/plumber.py", line 1018, in run
File "/usr/lib/calibre/calibre/ebooks/epub/output.py", line 169, in convert
File "/usr/lib/calibre/calibre/ebooks/oeb/transforms/split.py", line 57, in __call__
File "/usr/lib/calibre/calibre/ebooks/oeb/transforms/split.py", line 64, in split_item
page_breaks, page_break_ids = self.find_page_breaks(item)
File "/usr/lib/calibre/calibre/ebooks/oeb/transforms/split.py", line 123, in find_page_breaks
File "xpath.pxi", line 446, in lxml.etree.XPath.__init__ (src/lxml/lxml.etree.c:115005)
File "xpath.pxi", line 214, in lxml.etree._XPathEvaluatorBase._raise_parse_error (src/lxml/lxml.etree.c:112698)
lxml.etree.XPathSyntaxError: Invalid predicate
Is this a normal calibre behaviour or it's a pandoc's bug?
Thanks in advance for your help.