04-14-2011, 07:38 PM | #1 |
Junior Member
Posts: 2
Karma: 10
Join Date: Apr 2011
Device: Kindle 3
|
Converting pandoc generated HTML to ePUB with Calibre
Hi,
I recently formatted a (very long) book with pandoc. I find pandoc nice and really convenient for generating HTML. Then I convert the HTML into ePUB using calibre. Pandoc itself can generate ePUB, but the process is very automatic, and not many things can be customized. For instance, TOC levels cannot be customized (only chapters appear in TOC). That's the reason I use calibre to convert from HTML to ePUB: using the command line I can nicely control a lot of stuff about the generated ePUB (cover, TOC, etc...). The problem is that the book I'm converting is in Spanish, and some of the chapter's titles contains accents (á é í ó ú). For every section element (h1, h2, etc...) pandoc generates and id you can use to refer to that element in the text. For example, if a chapter is entitled "Introducción", pandoc generates this into the HTML. Code:
<h1 id="introducción">Introducción</h1> Here's calibre's output. Code:
Converting ebook with calibre 1% Converting input to HTML... InputFormatPlugin: HTML Input running on /home/Literature/Calibre-tests y pruebas/test-pandoc/pandoc-example.html Language not specified Building file list... Normalizing filename cases Rewriting HTML links 34% Running transforms on ebook... Merging user specified metadata... Detecting structure... Detected chapter: My Book Detected chapter: Chapter One Detected chapter: Chapter Two Auto generated TOC with 12 entries. Flattening CSS and remapping font sizes... Source base font size is 12.00000pt Cleaning up manifest... Trimming unused files from manifest... Creating EPUB Output... 67% Creating EPUB Output Traceback (most recent call last): File "/usr/bin/ebook-convert", line 19, in <module> sys.exit(main()) File "/usr/lib/calibre/calibre/ebooks/conversion/cli.py", line 279, in main plumber.run() File "/usr/lib/calibre/calibre/ebooks/conversion/plumber.py", line 1018, in run self.opts, self.log) File "/usr/lib/calibre/calibre/ebooks/epub/output.py", line 169, in convert split(self.oeb, self.opts) File "/usr/lib/calibre/calibre/ebooks/oeb/transforms/split.py", line 57, in __call__ self.split_item(item) File "/usr/lib/calibre/calibre/ebooks/oeb/transforms/split.py", line 64, in split_item page_breaks, page_break_ids = self.find_page_breaks(item) File "/usr/lib/calibre/calibre/ebooks/oeb/transforms/split.py", line 123, in find_page_breaks page_breaks_.append((XPath('//*[@id=%r]'%id), File "xpath.pxi", line 446, in lxml.etree.XPath.__init__ (src/lxml/lxml.etree.c:115005) File "xpath.pxi", line 214, in lxml.etree._XPathEvaluatorBase._raise_parse_error (src/lxml/lxml.etree.c:112698) lxml.etree.XPathSyntaxError: Invalid predicate Thanks in advance for your help. |
04-14-2011, 07:59 PM | #2 |
creator of calibre
Posts: 43,844
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
The problem is the non ascii characters in the id attribute. That is illegal in XHTML, as far as I recall.
|
04-15-2011, 01:25 PM | #3 |
Junior Member
Posts: 2
Karma: 10
Join Date: Apr 2011
Device: Kindle 3
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Covers in ePub files generated by Calibre | daviddem | Calibre | 14 | 06-30-2011 09:18 PM |
How much shall I pay you for converting HTML to ePUB? | vadimzn | ePub | 8 | 04-07-2011 01:46 AM |
Calibre Indent Issue When Removing Blank Lines (Converting From HTML to MOBI or EPUB) | David Derrico | Calibre | 5 | 08-04-2010 12:13 AM |
bookmark issues converting HTML to EPUB | isabellkirsten | Calibre | 0 | 04-09-2010 11:47 PM |