MobileRead Forums - View Single Post

unboggling · 01-25-2014, 01:55 PM

This recent discussion highlights a fundamental difference in approaches to fixing formatting problems in ebooks. These approaches seem both skill-driven and assumption-driven.

One assumption is that fixing formatting problems with regular expressions at code level is the best approach. I've noticed this is an assumption common to programmers, web designers, ebook designers, advanced calibre users. Particularly when technical knowledge and skills (regular expressions, HTML/XHTML, CSS) are at the higher end of the learning curve, it is easy to make the assumption. I guess that demographically this is a small (but vocal) minority of the total calibre user population.

A different assumption is that fixing formats above code level in a word processor works well enough. Particularly when technical knowledge and skills (regular expressions, HTML/XHTML, CSS) are at the lower end of the learning curve, it is easy to make this assumption. I made this assumption. I extrapolate that some other calibre users share this assumption, and guess that demographically this is a larger (but quieter) minority of the total calibre user population.

Consider those span-riddled original formats I looked at the other day. I had eliminated annoying formatting problems from copies of them a couple years ago with the method: EPUB -> RTF -> fix in Word or Open Office Writer -> DOCX or ODT -> EPUB. About 3 minutes time each. Two years later, having learned a lot since then, looking at the morass of HTML and XHTML and CSS tags in those span-plagued original formats, it seems that in Edit Book now it would take much longer to fix each format at code level, even if I knew how. Same with fixing them at code level outside calibre in a programmer-oriented editor.

So the first conversion to RTF blew away the ToC links — so what? — that's quickly fixable in calibre ToC Editor after conversion to EPUB, if not fixed already by the conversion-applied XPath expression. So the "fixed" EPUB contains unnecessary tags I didn't see while editing with word processor, and is larger in filesize than if it had been fixed cleanly at code level — so what? — I don't see those unnecessary tags when reading the book, and sufficient cheap storage is available to accommodate larger files.

Assumptions aside, approach and method to fix formatting problems depend on need, constrained by knowledge/skill level. From the point of view of an ebook consumer reading for enjoyment, I would ignore the technical aspect of ebooks, except for the need to fix formatting problems that annoy me, the quickest way possible at my current knowledge/skill level. From the point of view of an ebook designer, maybe I would want the underlying code to be clean.

But I'm not an ebook designer. I'm an ebook consumer, who likes reading books more than fixing books.

01-25-2014, 01:55 PM	#86
unboggling Wizard Posts: 1,065 Karma: 858115 Join Date: Jan 2011 Device: Kobo Clara, Kindle Paperwhite 10	This recent discussion highlights a fundamental difference in approaches to fixing formatting problems in ebooks. These approaches seem both skill-driven and assumption-driven. One assumption is that fixing formatting problems with regular expressions at code level is the best approach. I've noticed this is an assumption common to programmers, web designers, ebook designers, advanced calibre users. Particularly when technical knowledge and skills (regular expressions, HTML/XHTML, CSS) are at the higher end of the learning curve, it is easy to make the assumption. I guess that demographically this is a small (but vocal) minority of the total calibre user population. A different assumption is that fixing formats above code level in a word processor works well enough. Particularly when technical knowledge and skills (regular expressions, HTML/XHTML, CSS) are at the lower end of the learning curve, it is easy to make this assumption. I made this assumption. I extrapolate that some other calibre users share this assumption, and guess that demographically this is a larger (but quieter) minority of the total calibre user population. Consider those span-riddled original formats I looked at the other day. I had eliminated annoying formatting problems from copies of them a couple years ago with the method: EPUB -> RTF -> fix in Word or Open Office Writer -> DOCX or ODT -> EPUB. About 3 minutes time each. Two years later, having learned a lot since then, looking at the morass of HTML and XHTML and CSS tags in those span-plagued original formats, it seems that in Edit Book now it would take much longer to fix each format at code level, even if I knew how. Same with fixing them at code level outside calibre in a programmer-oriented editor. So the first conversion to RTF blew away the ToC links — so what? — that's quickly fixable in calibre ToC Editor after conversion to EPUB, if not fixed already by the conversion-applied XPath expression. So the "fixed" EPUB contains unnecessary tags I didn't see while editing with word processor, and is larger in filesize than if it had been fixed cleanly at code level — so what? — I don't see those unnecessary tags when reading the book, and sufficient cheap storage is available to accommodate larger files. Assumptions aside, approach and method to fix formatting problems depend on need, constrained by knowledge/skill level. From the point of view of an ebook consumer reading for enjoyment, I would ignore the technical aspect of ebooks, except for the need to fix formatting problems that annoy me, the quickest way possible at my current knowledge/skill level. From the point of view of an ebook designer, maybe I would want the underlying code to be clean. But I'm not an ebook designer. I'm an ebook consumer, who likes reading books more than fixing books. Last edited by unboggling; 02-01-2014 at 01:29 AM. Reason: clarify, change to more precise or correct technical terms, fix typos.