Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 08-09-2010, 01:47 AM   #1
radicalnomad
Junior Member
radicalnomad began at the beginning.
 
radicalnomad's Avatar
 
Posts: 3
Karma: 10
Join Date: Jul 2010
Device: Kindle 2
Removing header and footer

I've noticed a problem while converting a pdf file with Calibre: the removing of header and footer which test ok in the structure detection wizards do not appear removed in the final mobi output.

Here are the job details after the conversion, and I've noticed there's an issue in the parsing section which I assume is related. Could anyone provide some help on what I need to do to solve this?

Quote:
Convert book 1 of 1 (El Lobo Estepario (Spanish Edition))
Resolved conversion options
calibre version: 0.7.13
{'asciiize': False,
'author_sort': None,
'authors': None,
'base_font_size': 0.0,
'book_producer': None,
'change_justification': u'justify',
'chapter': u"//*[((name()='h1' or name()='h2') and re:test(., 'chapter|book|section|part\\s+', 'i')) or @class = 'chapter']",
'chapter_mark': u'pagebreak',
'comments': None,
'cover': '/tmp/calibre_0.7.13_qslwfi.jpeg',
'debug_pipeline': u'/home/ricardo/Documents/caliber_debugs',
'disable_font_rescaling': False,
'dont_compress': False,
'extra_css': None,
'font_size_mapping': None,
'footer_regex': u'<A name=.*?></a><i><b>El lobo estepario </b></i><br>\n<b>Hermann Hesse </b><br>',
'header_regex': u'<br>\n.*? <br>\n<hr>\n',
'input_encoding': None,
'input_profile': <calibre.customize.profiles.InputProfile object at 0xa23c50c>,
'insert_blank_line': True,
'insert_metadata': False,
'isbn': None,
'keep_ligatures': False,
'language': None,
'level1_toc': None,
'level2_toc': None,
'level3_toc': None,
'line_height': 0.0,
'linearize_tables': False,
'margin_bottom': 5.0,
'margin_left': 5.0,
'margin_right': 5.0,
'margin_top': 5.0,
'max_toc_links': 50,
'new_pdf_engine': False,
'no_chapters_in_toc': True,
'no_images': True,
'no_inline_navbars': True,
'no_inline_toc': True,
'output_profile': <calibre.customize.profiles.KindleOutput object at 0xa23cdac>,
'page_breaks_before': u"//*[name()='h1' or name()='h2']",
'personal_doc': u'[PDOC]',
'prefer_author_sort': False,
'prefer_metadata_cover': False,
'preprocess_html': True,
'pretty_print': False,
'pubdate': None,
'publisher': None,
'rating': None,
'read_metadata_from_opf': '/tmp/calibre_0.7.13_3RmSrp.opf',
'remove_first_image': True,
'remove_footer': True,
'remove_header': True,
'remove_paragraph_spacing': False,
'remove_paragraph_spacing_indent_size': 1.5,
'rescale_images': False,
'series': None,
'series_index': None,
'tags': None,
'timestamp': None,
'title': None,
'title_sort': None,
'toc_filter': None,
'toc_threshold': 6,
'toc_title': None,
'unwrap_factor': 0.5,
'use_auto_toc': False,
'verbose': 2}
InputFormatPlugin: PDF Input running
on /home/ricardo/Calibre Library/Hermann Hesse/El Lobo Estepario (Spanish Edition) (13523)/El Lobo Estepario (Spanish Edition) - Hermann Hesse.pdf
Converting file to html...
pdftohtml log:

Retrieving document metadata...
Generating manifest...
Rendering manifest...
Input debug saved to: /home/ricardo/Documents/caliber_debugs/input
Parsing all content...
Parsing index.html ...
Initial parse failed:
Traceback (most recent call last):
File "/home/kovid/build/calibre/src/calibre/ebooks/oeb/base.py", line 816, in first_pass
File "lxml.etree.pyx", line 2538, in lxml.etree.fromstring (src/lxml/lxml.etree.c:48266)
File "parser.pxi", line 1536, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:71653)
File "parser.pxi", line 1408, in lxml.etree._parseDoc (src/lxml/lxml.etree.c:70449)
File "parser.pxi", line 898, in lxml.etree._BaseParser._parseUnicodeDoc (src/lxml/lxml.etree.c:67144)
File "parser.pxi", line 539, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:63820)
File "parser.pxi", line 625, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:64741)
File "parser.pxi", line 565, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:64084)
XMLSyntaxError: Opening and ending tag mismatch: META line 8 and head, line 9, column 8

Parsing file 'index.html' as HTML
Forcing index.html into XHTML namespace
Generating default TOC from spine...
Parsed HTML written to: /home/ricardo/Documents/caliber_debugs/parsed
Merging user specified metadata...
Detecting structure...
Auto generated TOC with 3 entries.
Structured HTML written to: /home/ricardo/Documents/caliber_debugs/structure
Flattening CSS and remapping font sizes...
Source base font size is 12.00000pt
Cleaning up manifest...
Trimming unused files from manifest...
Parsing stylesheet.css ...

Processed HTML written to: /home/ricardo/Documents/caliber_debugs/processed
Creating MOBI Output...
Applying case-transforming CSS...
Parsing manglecase.css ...
Rasterizing SVG images...
Converting XHTML to Mobipocket markup...
Serializing markup content...
Hyperlink target 'index.html#23' not found
Hyperlink target 'index.html#10' not found
Hyperlink target 'index.html#2' not found
Compressing markup content...
Generating flat CTOC ...
adding (klass:chapter depth:1) TOC: ANOTACIONES DE HARRY HALLER .................................................. ...................... --> index.html#2 to flat ctoc
Ignoring missing TOC entry: TOC: ANOTACIONES DE HARRY HALLER .................................................. ...................... --> index.html#2
adding (klass:chapter depth:1) TOC: TRACTAT DEL LOBO ESTEPARIO .................................................. ....................... --> index.html#10 to flat ctoc
Ignoring missing TOC entry: TOC: TRACTAT DEL LOBO ESTEPARIO .................................................. ....................... --> index.html#10
adding (klass:chapter depth:1) TOC: SIGUEN LAS ANOTACIONES DE HARRY HALLER .................................................. ........... --> index.html#23 to flat ctoc
Ignoring missing TOC entry: TOC: SIGUEN LAS ANOTACIONES DE HARRY HALLER .................................................. ........... --> index.html#23
Instantiating a book MobiDocument of type 0x2
chapterCount: 0
CNCX utilization: 1 record, 0% full
No entries found in TOC ...
Writing unindexed mobi ...
Serializing images...
MOBI output written to /tmp/calibre_0.7.13_FAAu2J.mobi
radicalnomad is offline   Reply With Quote
Old 08-09-2010, 10:56 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 26,450
Karma: 5383257
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
The solution for this is to run the conversion in debug mode and look at the actual HTML in the input subdirectory (for some files, it can differ slightly from the html in the wizard).
kovidgoyal is offline   Reply With Quote
Old 08-26-2010, 11:34 AM   #3
radicalnomad
Junior Member
radicalnomad began at the beginning.
 
radicalnomad's Avatar
 
Posts: 3
Karma: 10
Join Date: Jul 2010
Device: Kindle 2
kovid, indeed that was the problem. Took me enough time, but further analysis of the HTML code and further regex education for me solved the issue, by using a less specific regex. thanks for your help!
radicalnomad is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
PDF Conversion - Removing Header / Footer Text heb Sony Reader 9 07-12-2010 12:02 AM
Header/Footer removal Solicitous Calibre 2 03-30-2010 06:53 AM
Regexp and header/footer problems concern Calibre 0 02-07-2010 04:35 AM
Header/Footer Problems with conversion Sydney's Mom Calibre 4 01-05-2010 12:04 PM
How to change header/footer font? mngharry LRF 1 06-02-2009 04:01 AM


All times are GMT -4. The time now is 11:11 PM.


MobileRead.com is a privately owned, operated and funded community.