View Single Post
Old 03-16-2010, 10:27 PM   #5
charleski
Wizard
charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.
 
Posts: 1,196
Karma: 1281258
Join Date: Sep 2009
Device: PRS-505
When you convert in calibre there's an option to specify a directory for debug info. If that's filled in, calibre will write out the html corresponding to each stage of its parsing and structure detection process, and sometimes you can root around in there to find a version that's easier to edit.

It's usually best to try to get a cleaner html file to start with, but it might be that the original was just very poorly coded -I've come across commercial ebooks that have clearly been run through calibre a couple of times by a lazy technician and end up a mess.

If there's no way to find a cleaner version of html to start from, then there's nothing for it but to strip the tags with a sequence of regular expressions. These can be tricky, and you need to be sure to save each intermediate step in case something goes wrong. Post a couple of lines of the html so we can see what the problem is like.
charleski is offline   Reply With Quote