06-21-2011, 08:12 AM | #1 |
Junior Member
Posts: 4
Karma: 10
Join Date: Jun 2011
Device: Kindle
|
Removing navigation bars from converted CHM
First of all I looked at the sticky on using the search/replace in Calibre. If that's the only way to do what I want to do, that's fine. I'll have to brush up on my regular expressions.
The approach I've been trying so far is a bit different though. I converted the CHM file to HTMLZ, unpacked it and ran a Python script I wrote to remove the navigation controls from the HTML, then packed it back up again, loaded the HTMLZ into Calibre, and tried to convert it to MOBI. I get an error that ends with: File "site-packages\calibre\ebooks\oeb\reader.py", line 300, in _spine_from_opf calibre.ebooks.oeb.base.OEBError: Spine is empty I've tried doing this two ways so far. First I removed the table that was holding the navigation controls. That's everything between and including <table>...</table>. When that didn't work I tried removing the row from the enclosing table that held the table with the navigation controls. I got the same error both times. As far as I know the deletions were clean, leaving valid HTML behind. So at this point I can only presume that some elements of the HTML structure are fixed and necessary for the document to scan properly. Am I removing too much? Too little? There seems to be some non-obvious meta structure to the HTML that is required for it to scan properly. Any guidance would be helpful. Thanks. |
06-21-2011, 03:22 PM | #2 |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
One common problem with some "HTML" files is that they are missing the enclosing <HTML> and </HTML> tags. That causes a "spine is empty" error during conversion, but the file is readable with a browser (which inserts the missing html tags). It's worth checking.
|
06-21-2011, 08:05 PM | #3 |
Junior Member
Posts: 4
Karma: 10
Join Date: Jun 2011
Device: Kindle
|
The <html> and </html> tags are there. The file will convert correctly before I strip out the navigation controls, so something I'm removing is causing the problem. I've done some spot checking to ensure that I'm not removing anything I don't intend to with my script, and what I'm seeing is that it's doing exactly what I intend. What does a "spine is empty" error mean? That might help me narrow my search.
In the meantime I'm running the file through some HTML validators to see if they can point out the problem, but so far the file is throwing so many errors that's not being helpful. Lots of closing </br> tags and such things that don't validate properly. |
06-22-2011, 12:51 AM | #4 |
Wizard
Posts: 4,552
Karma: 950151
Join Date: Nov 2008
Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)
|
Calibre requires valid (X)HTML to do a successful conversion, so if an HTML validator thinks there is problems with the file I would not expect Calibre to successfully convert it.
|
06-22-2011, 08:54 AM | #5 |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Many errors that would cause a validation failure of the HTML will produce the spine is empty error. You can search for that phrase here, but I doubt it will help much. I think of it as a generic error message that conversion has failed at a late stage due to errors early in the conversion that prevented the production of valid (X)HTML.
|
06-22-2011, 01:23 PM | #6 |
creator of calibre
Posts: 43,835
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
spine is empty means that no HTML documents were found. Note that calibre does not require valid (X)HTML, it is fairly forgiving in its parsing, but not infinitely so, various errors can cause the parsing to fail.
|
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
epub converted from chm crash sony reader | fdge | Conversion | 3 | 06-17-2011 02:09 AM |
Code block view (converted from CHM) | Replika | Calibre | 3 | 10-19-2010 02:08 PM |
Calibre cut lines in a converted CHM to MOBI | jomaweb | Calibre | 12 | 07-21-2010 03:07 PM |
Sprint Network - Five bars to No bars | CLWenn | Amazon Kindle | 3 | 03-06-2009 12:42 AM |
HTML to image and CHM to images and CHM to LRF | caritas | LRF | 0 | 12-14-2008 07:58 AM |