MobileRead Forums - View Single Post

JSWolf · 11-23-2011, 04:27 AM

Quote:

Originally Posted by Artha

I have 0.4.2 and try to do by hand a book from PDF to ePub. I have changed the file to barebones HTML and will attach a CSS file later. Now, things should be nice and clean with only the HTML tags and nothing more.

Yet when I hit „Generate TOC from headings” an id="heading_id_2" or id="heading_id_3" is attached to the headings. Why is that?

And can it be disabled?

You don't need the id="heading_id_2" if each chapter is a separate file. All you do in the NCX is call the file you want for each chapter entry without needing the # anchor.

What I do is use regex to strip it. I would search for od="heading_id_[0-9]*" and replace with nothing. This works in Notepad++. I've not tried it in Sigil so I do not know if that regex would work. Someone may be able to fix it if it's incorrect.

Quote:

Originally Posted by Artha

Weird. Why would Calibre use span, or , when there's for that?

Because that's what is in the HTML generated from the PDF.

I've seen code from some conversions were there was something like text of the book in every line and it got worse with italics. I was able to regex remove most of it and then manually remove it for every line that had italics.

With Calibre, a lot of the oddities are in the source fed to it.