| 
			
			
			
			 Junior Member 
			
			
			
		
			
				
			
			
			
				 
				Posts: 3 
				Karma: 10 
				Join Date: Jul 2010 
				
				
				
				Device: Kindle 2 
				
				
				     
			 					
		
	 | 
	
	
	
		
		
			
			 
				
				Removing header and footer
			 
			 
			
		
		
		
		
			
			I've noticed a problem while converting a pdf file with Calibre: the removing of header and footer which test ok in the structure detection wizards do not appear removed in the final mobi output. 
Here are the job details after the conversion, and I've noticed there's an issue in the parsing section which I assume is related. Could anyone provide some help on what I need to do to solve this?
 
	Quote: 
	
	
		
			
				Convert book 1 of 1 (El Lobo Estepario (Spanish Edition)) 
Resolved conversion options 
calibre version: 0.7.13 
{'asciiize': False, 
 'author_sort': None, 
 'authors': None, 
 'base_font_size': 0.0, 
 'book_producer': None, 
 'change_justification': u'justify', 
 'chapter': u"//*[((name()='h1' or name()='h2') and re:test(., 'chapter|book|section|part\\s+', 'i')) or @class = 'chapter']", 
 'chapter_mark': u'pagebreak', 
 'comments': None, 
 'cover': '/tmp/calibre_0.7.13_qslwfi.jpeg', 
 'debug_pipeline': u'/home/ricardo/Documents/caliber_debugs', 
 'disable_font_rescaling': False, 
 'dont_compress': False, 
 'extra_css': None, 
 'font_size_mapping': None, 
 'footer_regex': u'<A name=.*?></a><i><b>El lobo estepario </b></i><br>\n<b>Hermann Hesse </b><br>', 
 'header_regex': u'<br>\n.*? <br>\n<hr>\n', 
 'input_encoding': None, 
 'input_profile': <calibre.customize.profiles.InputProfile object at 0xa23c50c>, 
 'insert_blank_line': True, 
 'insert_metadata': False, 
 'isbn': None, 
 'keep_ligatures': False, 
 'language': None, 
 'level1_toc': None, 
 'level2_toc': None, 
 'level3_toc': None, 
 'line_height': 0.0, 
 'linearize_tables': False, 
 'margin_bottom': 5.0, 
 'margin_left': 5.0, 
 'margin_right': 5.0, 
 'margin_top': 5.0, 
 'max_toc_links': 50, 
 'new_pdf_engine': False, 
 'no_chapters_in_toc': True, 
 'no_images': True, 
 'no_inline_navbars': True, 
 'no_inline_toc': True, 
 'output_profile': <calibre.customize.profiles.KindleOutput object at 0xa23cdac>, 
 'page_breaks_before': u"//*[name()='h1' or name()='h2']", 
 'personal_doc': u'[PDOC]', 
 'prefer_author_sort': False, 
 'prefer_metadata_cover': False, 
 'preprocess_html': True, 
 'pretty_print': False, 
 'pubdate': None, 
 'publisher': None, 
 'rating': None, 
 'read_metadata_from_opf': '/tmp/calibre_0.7.13_3RmSrp.opf', 
 'remove_first_image': True, 
 'remove_footer': True, 
 'remove_header': True, 
 'remove_paragraph_spacing': False, 
 'remove_paragraph_spacing_indent_size': 1.5, 
 'rescale_images': False, 
 'series': None, 
 'series_index': None, 
 'tags': None, 
 'timestamp': None, 
 'title': None, 
 'title_sort': None, 
 'toc_filter': None, 
 'toc_threshold': 6, 
 'toc_title': None, 
 'unwrap_factor': 0.5, 
 'use_auto_toc': False, 
 'verbose': 2} 
InputFormatPlugin: PDF Input running 
on /home/ricardo/Calibre Library/Hermann Hesse/El Lobo Estepario (Spanish Edition) (13523)/El Lobo Estepario (Spanish Edition) - Hermann Hesse.pdf 
Converting file to html... 
pdftohtml log: 
 
Retrieving document metadata... 
Generating manifest... 
Rendering manifest... 
Input debug saved to: /home/ricardo/Documents/caliber_debugs/input 
Parsing all content... 
Parsing index.html ... 
Initial parse failed: 
Traceback (most recent call last): 
  File "/home/kovid/build/calibre/src/calibre/ebooks/oeb/base.py", line 816, in first_pass 
  File "lxml.etree.pyx", line 2538, in lxml.etree.fromstring (src/lxml/lxml.etree.c:48266) 
  File "parser.pxi", line 1536, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:71653) 
  File "parser.pxi", line 1408, in lxml.etree._parseDoc (src/lxml/lxml.etree.c:70449) 
  File "parser.pxi", line 898, in lxml.etree._BaseParser._parseUnicodeDoc (src/lxml/lxml.etree.c:67144) 
  File "parser.pxi", line 539, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:63820) 
  File "parser.pxi", line 625, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:64741) 
  File "parser.pxi", line 565, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:64084) 
XMLSyntaxError: Opening and ending tag mismatch: META line 8 and head, line 9, column 8 
 
Parsing file 'index.html' as HTML 
Forcing index.html into XHTML namespace 
Generating default TOC from spine... 
Parsed HTML written to: /home/ricardo/Documents/caliber_debugs/parsed 
Merging user specified metadata... 
Detecting structure... 
Auto generated TOC with 3 entries. 
Structured HTML written to: /home/ricardo/Documents/caliber_debugs/structure 
Flattening CSS and remapping font sizes... 
Source base font size is 12.00000pt 
Cleaning up manifest... 
Trimming unused files from manifest... 
Parsing stylesheet.css ... 
 
Processed HTML written to: /home/ricardo/Documents/caliber_debugs/processed 
Creating MOBI Output... 
Applying case-transforming CSS... 
Parsing manglecase.css ... 
Rasterizing SVG images... 
Converting XHTML to Mobipocket markup... 
Serializing markup content... 
Hyperlink target 'index.html#23' not found 
Hyperlink target 'index.html#10' not found 
Hyperlink target 'index.html#2' not found 
  Compressing markup content... 
Generating flat CTOC ... 
adding (klass:chapter depth:1) TOC: ANOTACIONES DE HARRY HALLER ..................................................  ...................... --> index.html#2 to flat ctoc 
  Ignoring missing TOC entry: TOC: ANOTACIONES DE HARRY HALLER ..................................................  ...................... --> index.html#2 
adding (klass:chapter depth:1) TOC: TRACTAT DEL LOBO ESTEPARIO ..................................................  ....................... --> index.html#10 to flat ctoc 
  Ignoring missing TOC entry: TOC: TRACTAT DEL LOBO ESTEPARIO ..................................................  ....................... --> index.html#10 
adding (klass:chapter depth:1) TOC: SIGUEN LAS ANOTACIONES DE HARRY HALLER ..................................................  ........... --> index.html#23 to flat ctoc 
  Ignoring missing TOC entry: TOC: SIGUEN LAS ANOTACIONES DE HARRY HALLER ..................................................  ........... --> index.html#23 
Instantiating a book MobiDocument of type 0x2 
chapterCount: 0 
  CNCX utilization: 1 record, 0% full 
  No entries found in TOC ... 
  Writing unindexed mobi ... 
Serializing images... 
MOBI output written to /tmp/calibre_0.7.13_FAAu2J.mobi
			
		 | 
	 
	 
 
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 |