Suddenly Calibre can't process html

saxondawg · 08-27-2010, 11:14 PM

I have installed the new Calibre (uninstalled, rebooted, installed again) and still find that I can't convert a simple html. Basically Calibre is telling me that there is no <html> tag, "Spine is empty," although this is NOT true. I know that files I converted this week with no problem, cannot be redone now. So it's NOT a problem in the files themselves. For some reason Calibre cannot read my html files. Is there a simple answer? Below is the error message, and after that, the opening code on the html file. If this isn't enough, I'll open a ticket--I actually tried that, but couldn't quickly find where to do so.

Thanks

ERROR: Conversion Error: Failed: Convert book 1 of 1

Convert book 1 of 1
Processing archive...
Resolved conversion options
calibre version: 0.7.16
{'asciiize': False,
'author_sort': None,
'authors': None,
'base_font_size': 0.0,
'book_producer': None,
'breadth_first': False,
'change_justification': u'left',
'chapter': u"//*[((name()='h1' or name()='h2') and re:test(., 'chapter|book|section|part\\s+', 'i')) or @class = 'chapter']",
'chapter_mark': u'pagebreak',
'comments': None,
'cover': 'c:\\users\\me\\appdata\\local\\temp\\calibre_0.7. 16_tmp_lo05cb\\calibre_0.7.16_fuistw.jpeg',
'debug_pipeline': None,
'disable_font_rescaling': False,
'dont_compress': False,
'dont_package': False,
'extra_css': None,
'font_size_mapping': None,
'footer_regex': u'(?i)(?<=<hr>)((\\s*<a name=\\d+></a>((<img.+?>)* \\s*)?\\d+ \\s*.*?\\s*)|(\\s* <a name=\\d+></a>((<img.+?>)* \\s*)?.*? \\s*\\d+))(?= )' ,
'header_regex': u'(?i)(?<=<hr>)((\\s*<a name=\\d+></a>((<img.+?>)* \\s*)?\\d+ \\s*.*?\\s*)|(\\s* <a name=\\d+></a>((<img.+?>)* \\s*)?.*? \\s*\\d+))(?= )' ,
'input_encoding': None,
'input_profile': <calibre.customize.profiles.InputProfile object at 0x04F839F0>,
'insert_blank_line': False,
'insert_metadata': True,
'isbn': None,
'keep_ligatures': False,
'language': None,
'level1_toc': None,
'level2_toc': None,
'level3_toc': None,
'line_height': 0.0,
'linearize_tables': False,
'margin_bottom': 5.0,
'margin_left': 5.0,
'margin_right': 5.0,
'margin_top': 5.0,
'max_levels': 5,
'max_toc_links': 120,
'no_chapters_in_toc': False,
'no_inline_navbars': True,
'no_inline_toc': False,
'output_profile': <calibre.customize.profiles.KindleOutput object at 0x04F83CD0>,
'page_breaks_before': u"//*[name()='h1' or name()='h2']",
'personal_doc': u'[PDOC]',
'prefer_author_sort': True,
'prefer_metadata_cover': False,
'preprocess_html': True,
'pretty_print': False,
'pubdate': None,
'publisher': None,
'rating': None,
'read_metadata_from_opf': 'c:\\users\\rob\\appdata\\local\\temp\\calibre_0.7 .16_tmp_lo05cb\\calibre_0.7.16_vqvw1_.opf',
'remove_first_image': False,
'remove_footer': False,
'remove_header': False,
'remove_paragraph_spacing': True,
'remove_paragraph_spacing_indent_size': 1.5,
'rescale_images': False,
'series': None,
'series_index': None,
'tags': None,
'timestamp': None,
'title': None,
'title_sort': None,
'toc_filter': None,
'toc_threshold': 6,
'toc_title': None,
'unwrap_factor': 0.0,
'use_auto_toc': True,
'verbose': 2}
InputFormatPlugin: HTML Input running
on c:\users\rob\appdata\local\temp\calibre_0.7.16_tmp _jgzqea\calibre_0.7.16_ktzt6d_plumber\content.opf
Parsing all content...
Manifest item 'toc.ncx' not found
Parsing file.html ...
Failed to parse content in file.html
Traceback (most recent call last):
File "site-packages\calibre\ebooks\oeb\reader.py", line 159, in _manifest_prune_invalid
File "site-packages\calibre\ebooks\oeb\base.py", line 1060, in fget
File "site-packages\calibre\ebooks\oeb\base.py", line 789, in _parse_xhtml
File "site-packages\calibre\ebooks\conversion\preprocess.py", line 350, in __call__
File "site-packages\calibre\ebooks\html\input.py", line 494, in preprocess_html
AttributeError: 'HTMLInput' object has no attribute 'log'

Parsing stylesheet.css ...
Spine item 'html' not found
Python function terminated unexpectedly
Spine is empty (Error Code: 1)
Traceback (most recent call last):
File "site.py", line 103, in main
File "site.py", line 85, in run_entry_point
File "site-packages\calibre\utils\ipc\worker.py", line 99, in main
File "site-packages\calibre\gui2\convert\gui_conversion.py", line 24, in gui_convert
File "site-packages\calibre\ebooks\conversion\plumber.py", line 815, in run
File "site-packages\calibre\customize\conversion.py", line 211, in __call__
File "site-packages\calibre\ebooks\html\input.py", line 298, in convert
File "site-packages\calibre\ebooks\conversion\plumber.py", line 951, in create_oebbook
File "site-packages\calibre\ebooks\oeb\reader.py", line 72, in __call__
File "site-packages\calibre\ebooks\oeb\reader.py", line 594, in _all_from_opf
File "site-packages\calibre\ebooks\oeb\reader.py", line 289, in _spine_from_opf
calibre.ebooks.oeb.base.OEBError: Spine is empty

* * * * * * * *

Yet the html file that was the source begins with this code:

<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>(file)</title>

kovidgoyal · 08-28-2010, 12:26 AM

There's a typo in the preprocess code, disable the preprocess option and you should be fine.

saxondawg · 08-28-2010, 12:59 AM

That was it! Thank you sir. I don't know whether I introduced that error somehow (don't remember fooling with the code on that line) or whether it entered on this new update of the program. But I knew it was something like that.

Thank you very much!

Rachel · 09-02-2010, 03:56 AM

I'm having the same problem. Can you tell me how I'll recognise the preprocess code, and where to find it so that I can disable it? Or is this something that will be corrected in 7.17?

Thank you.

ldolse · 09-02-2010, 04:35 AM

The preprocess code is an option under structure detection, it's called "preprocess input file to possibly improve structure detection" in the GUI. It's disabled by default, and should only cause a problem if you check that box for html files.

Are you seeing it for some other input format? 7.17 will fix it for html.

Rachel · 09-02-2010, 07:05 AM

Idolse - thanks for the ideas. The preprocess box was checked, but unchecking it doesn't make any difference. I can't see that I can check/uncheck it for specific input formats. I also realised that there is non tab for input HTML - I can't remember if that is the norm or not, and was the initial reason for re-downloading Calibre.

I shall have to see what the next update brings, and see if that solves the problem.

saxondawg · 09-03-2010, 01:00 AM

Rachel, you might do as I did and save the error log. Kovid was able to read mine above, and see that the typo was causing the problem.

Rachel · 09-03-2010, 05:01 AM

Yes, I started to do that, then assumed it was the same as yours, and left it.

I've attached the error message now

Rachel · 09-04-2010, 07:40 AM

Hi - I just wanted to thank the Calibre team for the fix - I have installed 7.17 this morning and my html files convert quite happily again.

DoctorOhh · 09-04-2010, 07:49 AM

Quote:

Originally Posted by saxondawg

I don't know whether I introduced that error somehow (don't remember fooling with the code on that line) or whether it entered on this new update of the program.

No you didn't cause the error.

Quote:

Originally Posted by saxondawg

Rachel, you might do as I did and save the error log. Kovid was able to read mine above, and see that the typo was causing the problem.

Your error log did help Kovid figure out where in the code to look and he found a typo that he or one of the developers accidentally left in the code that caused the error.

You are correct, enclosing the error log is always a good idea, unless it has already been diagnosed by Kovid and he stated the fix would be in the next release. Like he did in this thread.

@Rachel, I'm glad it's working for you again.

saxondawg · 09-12-2010, 03:20 PM

When the newest Calibre came out, I noted the fix on this specific bug and was glad I could be a part of detecting it. I had always checked that box automatically just figuring "why not?"

It's amazing how smart Calibre is now in converting my documents. Seems to me it does a better job of cooperating with stubborn XML stylesheets to create the final e-book we want.

08-27-2010, 11:14 PM	#1
saxondawg Connoisseur Posts: 65 Karma: 50952 Join Date: Mar 2010 Device: kindle paperwhite	Suddenly Calibre can't process html I have installed the new Calibre (uninstalled, rebooted, installed again) and still find that I can't convert a simple html. Basically Calibre is telling me that there is no <html> tag, "Spine is empty," although this is NOT true. I know that files I converted this week with no problem, cannot be redone now. So it's NOT a problem in the files themselves. For some reason Calibre cannot read my html files. Is there a simple answer? Below is the error message, and after that, the opening code on the html file. If this isn't enough, I'll open a ticket--I actually tried that, but couldn't quickly find where to do so. Thanks ERROR: Conversion Error: <b>Failed</b>: Convert book 1 of 1 Convert book 1 of 1 Processing archive... Resolved conversion options calibre version: 0.7.16 {'asciiize': False, 'author_sort': None, 'authors': None, 'base_font_size': 0.0, 'book_producer': None, 'breadth_first': False, 'change_justification': u'left', 'chapter': u"//[((name()='h1' or name()='h2') and re:test(., 'chapter\|book\|section\|part\\s+', 'i')) or @class = 'chapter']", 'chapter_mark': u'pagebreak', 'comments': None, 'cover': 'c:\\users\\me\\appdata\\local\\temp\\calibre_0.7. 16_tmp_lo05cb\\calibre_0.7.16_fuistw.jpeg', 'debug_pipeline': None, 'disable_font_rescaling': False, 'dont_compress': False, 'dont_package': False, 'extra_css': None, 'font_size_mapping': None, 'footer_regex': u'(?i)(?<=<hr>)((\\s<a name=\\d+></a>((<img.+?>)<br>\\s)?\\d+<br>\\s.?\\s)\|(\\s <a name=\\d+></a>((<img.+?>)<br>\\s)?.?<br>\\s\\d+))(?=<br>)' , 'header_regex': u'(?i)(?<=<hr>)((\\s<a name=\\d+></a>((<img.+?>)<br>\\s)?\\d+<br>\\s.?\\s)\|(\\s* <a name=\\d+></a>((<img.+?>)<br>\\s)?.?<br>\\s\\d+))(?=<br>)' , 'input_encoding': None, 'input_profile': <calibre.customize.profiles.InputProfile object at 0x04F839F0>, 'insert_blank_line': False, 'insert_metadata': True, 'isbn': None, 'keep_ligatures': False, 'language': None, 'level1_toc': None, 'level2_toc': None, 'level3_toc': None, 'line_height': 0.0, 'linearize_tables': False, 'margin_bottom': 5.0, 'margin_left': 5.0, 'margin_right': 5.0, 'margin_top': 5.0, 'max_levels': 5, 'max_toc_links': 120, 'no_chapters_in_toc': False, 'no_inline_navbars': True, 'no_inline_toc': False, 'output_profile': <calibre.customize.profiles.KindleOutput object at 0x04F83CD0>, 'page_breaks_before': u"//[name()='h1' or name()='h2']", 'personal_doc': u'[PDOC]', 'prefer_author_sort': True, 'prefer_metadata_cover': False, 'preprocess_html': True, 'pretty_print': False, 'pubdate': None, 'publisher': None, 'rating': None, 'read_metadata_from_opf': 'c:\\users\\rob\\appdata\\local\\temp\\calibre_0.7 .16_tmp_lo05cb\\calibre_0.7.16_vqvw1_.opf', 'remove_first_image': False, 'remove_footer': False, 'remove_header': False, 'remove_paragraph_spacing': True, 'remove_paragraph_spacing_indent_size': 1.5, 'rescale_images': False, 'series': None, 'series_index': None, 'tags': None, 'timestamp': None, 'title': None, 'title_sort': None, 'toc_filter': None, 'toc_threshold': 6, 'toc_title': None, 'unwrap_factor': 0.0, 'use_auto_toc': True, 'verbose': 2} InputFormatPlugin: HTML Input running on c:\users\rob\appdata\local\temp\calibre_0.7.16_tmp _jgzqea\calibre_0.7.16_ktzt6d_plumber\content.opf Parsing all content... Manifest item 'toc.ncx' not found Parsing file.html ... Failed to parse content in file.html Traceback (most recent call last): File "site-packages\calibre\ebooks\oeb\reader.py", line 159, in _manifest_prune_invalid File "site-packages\calibre\ebooks\oeb\base.py", line 1060, in fget File "site-packages\calibre\ebooks\oeb\base.py", line 789, in _parse_xhtml File "site-packages\calibre\ebooks\conversion\preprocess.py", line 350, in __call__ File "site-packages\calibre\ebooks\html\input.py", line 494, in preprocess_html AttributeError: 'HTMLInput' object has no attribute 'log' Parsing stylesheet.css ... Spine item 'html' not found Python function terminated unexpectedly Spine is empty (Error Code: 1) Traceback (most recent call last): File "site.py", line 103, in main File "site.py", line 85, in run_entry_point File "site-packages\calibre\utils\ipc\worker.py", line 99, in main File "site-packages\calibre\gui2\convert\gui_conversion.py", line 24, in gui_convert File "site-packages\calibre\ebooks\conversion\plumber.py", line 815, in run File "site-packages\calibre\customize\conversion.py", line 211, in __call__ File "site-packages\calibre\ebooks\html\input.py", line 298, in convert File "site-packages\calibre\ebooks\conversion\plumber.py", line 951, in create_oebbook File "site-packages\calibre\ebooks\oeb\reader.py", line 72, in __call__ File "site-packages\calibre\ebooks\oeb\reader.py", line 594, in _all_from_opf File "site-packages\calibre\ebooks\oeb\reader.py", line 289, in _spine_from_opf calibre.ebooks.oeb.base.OEBError: Spine is empty * * * * * * * Yet the html file that was the source begins with this code: <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>(file)</title>

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
stanza/iphone suddenly can't find calibre on pc	Allis	Calibre	13	09-28-2010 06:59 PM
Calibre Recipe HTML content differs from raw html of index.html.	krunk	Calibre	4	09-20-2010 09:48 PM
suddenly a problem with Calibre libraries	Rie142	Calibre	2	08-11-2010 11:08 AM
Calibre suddenly not recognizing reader	Pangolin	Calibre	8	03-05-2010 04:13 PM
Calibre suddenly copies files to ebook subdirectory on DR	animedude01	Calibre	11	08-20-2009 01:00 PM

08-28-2010, 12:26 AM	#2
kovidgoyal creator of calibre Posts: 44,337 Karma: 23661992 Join Date: Oct 2006 Location: Mumbai, India Device: Various	There's a typo in the preprocess code, disable the preprocess option and you should be fine.

08-28-2010, 12:59 AM	#3
saxondawg Connoisseur Posts: 65 Karma: 50952 Join Date: Mar 2010 Device: kindle paperwhite	That was it! Thank you sir. I don't know whether I introduced that error somehow (don't remember fooling with the code on that line) or whether it entered on this new update of the program. But I knew it was something like that. Thank you very much!

09-02-2010, 03:56 AM	#4
Rachel Zealot Posts: 115 Karma: 260 Join Date: Sep 2008 Location: Suffolk, England Device: sony prs505, kindle, ithing	I'm having the same problem. Can you tell me how I'll recognise the preprocess code, and where to find it so that I can disable it? Or is this something that will be corrected in 7.17? Thank you.

09-02-2010, 04:35 AM	#5
ldolse Wizard Posts: 1,337 Karma: 123455 Join Date: Apr 2009 Location: Malaysia Device: PRS-650, iPhone	The preprocess code is an option under structure detection, it's called "preprocess input file to possibly improve structure detection" in the GUI. It's disabled by default, and should only cause a problem if you check that box for html files. Are you seeing it for some other input format? 7.17 will fix it for html.

09-02-2010, 07:05 AM	#6
Rachel Zealot Posts: 115 Karma: 260 Join Date: Sep 2008 Location: Suffolk, England Device: sony prs505, kindle, ithing	Idolse - thanks for the ideas. The preprocess box was checked, but unchecking it doesn't make any difference. I can't see that I can check/uncheck it for specific input formats. I also realised that there is non tab for input HTML - I can't remember if that is the norm or not, and was the initial reason for re-downloading Calibre. I shall have to see what the next update brings, and see if that solves the problem.

09-03-2010, 01:00 AM	#7
saxondawg Connoisseur Posts: 65 Karma: 50952 Join Date: Mar 2010 Device: kindle paperwhite	Rachel, you might do as I did and save the error log. Kovid was able to read mine above, and see that the typo was causing the problem.

09-04-2010, 07:40 AM	#9
Rachel Zealot Posts: 115 Karma: 260 Join Date: Sep 2008 Location: Suffolk, England Device: sony prs505, kindle, ithing	Hi - I just wanted to thank the Calibre team for the fix - I have installed 7.17 this morning and my html files convert quite happily again.

09-12-2010, 03:20 PM	#11
saxondawg Connoisseur Posts: 65 Karma: 50952 Join Date: Mar 2010 Device: kindle paperwhite	When the newest Calibre came out, I noted the fix on this specific bug and was glad I could be a part of detecting it. I had always checked that box automatically just figuring "why not?" It's amazing how smart Calibre is now in converting my documents. Seems to me it does a better job of cooperating with stubborn XML stylesheets to create the final e-book we want.

Advert

Advert