Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 04-14-2011, 07:38 PM   #1
Wintermute
Junior Member
Wintermute began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Apr 2011
Device: Kindle 3
Converting pandoc generated HTML to ePUB with Calibre

Hi,

I recently formatted a (very long) book with pandoc. I find pandoc nice and really convenient for generating HTML. Then I convert the HTML into ePUB using calibre. Pandoc itself can generate ePUB, but the process is very automatic, and not many things can be customized. For instance, TOC levels cannot be customized (only chapters appear in TOC). That's the reason I use calibre to convert from HTML to ePUB: using the command line I can nicely control a lot of stuff about the generated ePUB (cover, TOC, etc...).

The problem is that the book I'm converting is in Spanish, and some of the chapter's titles contains accents (á é í ó ú). For every section element (h1, h2, etc...) pandoc generates and id you can use to refer to that element in the text. For example, if a chapter is entitled "Introducción", pandoc generates this into the HTML.

Code:
<h1 id="introducción">Introducción</h1>
Calibre crashes if some hX header contains non-ascii characters.

Here's calibre's output.

Code:
Converting ebook with calibre
1% Converting input to HTML...
InputFormatPlugin: HTML Input running
on /home/Literature/Calibre-tests y pruebas/test-pandoc/pandoc-example.html
Language not specified
Building file list...
Normalizing filename cases
Rewriting HTML links
34% Running transforms on ebook...
Merging user specified metadata...
Detecting structure...
        Detected chapter: My Book
        Detected chapter: Chapter One
        Detected chapter: Chapter Two
Auto generated TOC with 12 entries.
Flattening CSS and remapping font sizes...
Source base font size is 12.00000pt
Cleaning up manifest...
Trimming unused files from manifest...
Creating EPUB Output...
67% Creating EPUB Output
Traceback (most recent call last):
  File "/usr/bin/ebook-convert", line 19, in <module>
    sys.exit(main())
  File "/usr/lib/calibre/calibre/ebooks/conversion/cli.py", line 279, in main
    plumber.run()                                                                                                                                                                
  File "/usr/lib/calibre/calibre/ebooks/conversion/plumber.py", line 1018, in run                                                                                                
    self.opts, self.log)                                                                                                                                                         
  File "/usr/lib/calibre/calibre/ebooks/epub/output.py", line 169, in convert                                                                                                    
    split(self.oeb, self.opts)
  File "/usr/lib/calibre/calibre/ebooks/oeb/transforms/split.py", line 57, in __call__
    self.split_item(item)
  File "/usr/lib/calibre/calibre/ebooks/oeb/transforms/split.py", line 64, in split_item
    page_breaks, page_break_ids = self.find_page_breaks(item)
  File "/usr/lib/calibre/calibre/ebooks/oeb/transforms/split.py", line 123, in find_page_breaks
    page_breaks_.append((XPath('//*[@id=%r]'%id),
  File "xpath.pxi", line 446, in lxml.etree.XPath.__init__ (src/lxml/lxml.etree.c:115005)
  File "xpath.pxi", line 214, in lxml.etree._XPathEvaluatorBase._raise_parse_error (src/lxml/lxml.etree.c:112698)
lxml.etree.XPathSyntaxError: Invalid predicate
Is this a normal calibre behaviour or it's a pandoc's bug?

Thanks in advance for your help.
Wintermute is offline   Reply With Quote
Old 04-14-2011, 07:59 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,844
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
The problem is the non ascii characters in the id attribute. That is illegal in XHTML, as far as I recall.
kovidgoyal is online now   Reply With Quote
Old 04-15-2011, 01:25 PM   #3
Wintermute
Junior Member
Wintermute began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Apr 2011
Device: Kindle 3
Quote:
Originally Posted by kovidgoyal View Post
The problem is the non ascii characters in the id attribute. That is illegal in XHTML, as far as I recall.
Thanks Kovid.
Wintermute is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Covers in ePub files generated by Calibre daviddem Calibre 14 06-30-2011 09:18 PM
How much shall I pay you for converting HTML to ePUB? vadimzn ePub 8 04-07-2011 01:46 AM
Calibre Indent Issue When Removing Blank Lines (Converting From HTML to MOBI or EPUB) David Derrico Calibre 5 08-04-2010 12:13 AM
bookmark issues converting HTML to EPUB isabellkirsten Calibre 0 04-09-2010 11:47 PM


All times are GMT -4. The time now is 08:03 PM.


MobileRead.com is a privately owned, operated and funded community.