Quote:
Originally Posted by LadyKate
...One of the first steps after clearing up the excess spans and font settings in an html file is to check for paragraph markings.
A book that has <br> or <br/> as a means of separating paragraphs of text is not going to allow you to use calibre to indent the first line of paragraphs....
|
I see what LadyKate meant about excess span and font tags in some old formats.
I've been looking at some raw book formats: unfixed original downloaded files that I kept separately outside of calibre. Specifically these were original format files that I copied into calibre way back when, then had fixed the calibre copy with the method RTF -> Word (advanced find & replace) -> DOCX.
I added the raw unfixed originals into calibre again as separate duplicate records, converted them to EPUB, and looked at them in Edit Book.
So I saw what LadyKate was talking about. For the most part these formats were extravagantly riddled with excess span and font tags. (That was boggling. I didn't even try to fix them in Edit Book, hadn't a clue where to start. There seemed to be more html tags than content text.) So, like LadyKate said, that usage of spans is another common thing, in addition to the break tag instead of paragraph tags thing. In the past habitually fixing things in RTF in Word, the specific nature of the html problems had been invisible to me.
In the conversion of original to EPUB, calibre had added its own classes to that span mishmash as best it could. Which seemed to make the span multitude harder to deal with.
But I'm just starting to learn about this stuff on the html side. And don't really know what I'm doing there yet.
Meanwhile, I was really looking for an old raw file with a lot of break tags so I could play with theduck's search/replace regex in html editor or Edit Book. Didn't find any of those, got distracted by the formats with span problem.