MobileRead Forums - View Single Post - Need help converting file which is too long to be HTML

charleski · 03-16-2010, 11:27 PM

When you convert in calibre there's an option to specify a directory for debug info. If that's filled in, calibre will write out the html corresponding to each stage of its parsing and structure detection process, and sometimes you can root around in there to find a version that's easier to edit.

It's usually best to try to get a cleaner html file to start with, but it might be that the original was just very poorly coded -I've come across commercial ebooks that have clearly been run through calibre a couple of times by a lazy technician and end up a mess.

If there's no way to find a cleaner version of html to start from, then there's nothing for it but to strip the tags with a sequence of regular expressions. These can be tricky, and you need to be sure to save each intermediate step in case something goes wrong. Post a couple of lines of the html so we can see what the problem is like.

03-16-2010, 11:27 PM	#5
charleski Wizard Posts: 1,196 Karma: 1281258 Join Date: Sep 2009 Device: PRS-505	When you convert in calibre there's an option to specify a directory for debug info. If that's filled in, calibre will write out the html corresponding to each stage of its parsing and structure detection process, and sometimes you can root around in there to find a version that's easier to edit. It's usually best to try to get a cleaner html file to start with, but it might be that the original was just very poorly coded -I've come across commercial ebooks that have clearly been run through calibre a couple of times by a lazy technician and end up a mess. If there's no way to find a cleaner version of html to start from, then there's nothing for it but to strip the tags with a sequence of regular expressions. These can be tricky, and you need to be sure to save each intermediate step in case something goes wrong. Post a couple of lines of the html so we can see what the problem is like.