Quote:
Originally Posted by unboggling
I see what LadyKate meant about excess span and font tags in some old formats.
I've been looking at some raw book formats: unfixed original downloaded files that I kept separately outside of calibre. Specifically these were original format files that I copied into calibre way back when, then had fixed the calibre copy with the method RTF -> Word (advanced find & replace) -> DOCX.
I added the raw unfixed originals into calibre again as separate duplicate records, converted them to EPUB, and looked at them in Edit Book.
So I saw what LadyKate was talking about. For the most part these formats were extravagantly riddled with excess span and font tags. (That was boggling. I didn't even try to fix them in Edit Book, hadn't a clue where to start. There seemed to be more html tags than content text.) So, like LadyKate said, that usage of spans is another common thing, in addition to the break tag instead of paragraph tags thing. In the past habitually fixing things in RTF in Word, the specific nature of the html problems had been invisible to me.
In the conversion of original to EPUB, calibre had added its own classes to that span mishmash as best it could. Which seemed to make the span multitude harder to deal with.
But I'm just starting to learn about this stuff on the html side. And don't really know what I'm doing there yet.
Meanwhile, I was really looking for an old raw file with a lot of break tags so I could play with theduck's search/replace regex in html editor or Edit Book. Didn't find any of those, got distracted by the formats with span problem.
|
The problem is with the way word processors (and rtf editiors are just another form of word processor) work.
I have not seen a word processor since the days of the old dos versions of WordPerfect that shows the codes inserted to change the look and feel of the document created.
Every time you make a change even if you don't complete it, a code is inserted. You change to italic, change your mind, remove the two characters typed, change the color etc. and it leaves more font changes, spans, color changes etc than text.
While you only see the result of all these changes in a WYSIWYG word processor or web page creator, they are only as good as the underlying code.