View Single Post
Old 08-08-2013, 07:48 PM   #39
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
I also mentioned keeping in mind coding your book for "long-term formats after EPUB", because that is part of my goals for what I am trying to accomplish at work.

I work for a non-profit economics website. Having code that is:

- Minimalistic
- Clean
- Consistent

I use approximately the same exact CSS file throughout all books (~150 converted so far, some books have unique CSS which I always place at the end of the CSS file).

The code will allow it to be copied/pasted/ported over to whatever (can easily copy/paste into blogs, on the site, on forums for debates, etc. etc.).

In the future, if I come up with an alternate way of displaying the book, because I made sure my code was consistently formatted throughout the EPUB in the first place, it will be easy as pie to change things with simple regex.

For an easy example, take footnotes. Early on I used the superscript format:

Code:
<a href="#fn3" id="ft3"><sup>3</sup></a>

LINKS TO:

<p><a href="#ft3" id="fn3"><sup>3</sup></a> Harriet Martineau’s Hist. of Eng. I. 294.</p>
At a later date, we decided to remove superscripts, and replace with a bracketed number (easier to click on a touchscreen, easier to click on a smartphone, easier to read, ...).

Code:
<a href="#fn3" id="ft3">[3]</a>

LINKS TO:

<p><a href="#ft3" id="fn3">[3]</a> Harriet Martineau’s Hist. of Eng. I. 294.</p>
BUT, it is personal preferences. I might prefer [3], you might prefer <sup>3</sup>, another publisher might prefer having full filenames and a separate Notes chapter:

Code:
<a href="Notes.xhtml#fn2.3" id="ft2.3">[3]</a>

LINKS TO:

<p><a href="Chapter2.xhtml#ft2.3" id="fn2.3">[3]</a> Harriet Martineau’s Hist. of Eng. I. 294.</p>
While another might have a behemoth like this:

Code:
<a class="footnote-link type-footnote" href="../Text/notes.xhtml#lf1231_footnote_nt057" id="lf1231_footnote_nt057_ref">*</a>

LINKS TO:

<div class="type-footnote note" id="lf1231_footnote_nt057">
    <a href="../Text/02.xhtml#lf1231_footnote_nt057_ref" id="lf1231_label_068">*</a>

    <p>Harriet Martineau’s Hist of Eng. I. 294.</p>
  </div>
... but as long as you stay CONSISTENT throughout the works, you can figure it out and change things around. This is where human judgement/insight/manual intervention is needed. A computer can't do this. There are too many ways/preferences/conversion tools out there, and nothing but some nice regex elbow grease can fix it.

Another example is when we export from InDesign.... I have a list of Regex I specifically made for that sitting in Sigil. I can push a button, and most of the crappy InDesign code is cleaned up/minimized, but that is due to our typesetter using classes CONSISTENTLY in the original InDesign file. A different typesetter/company doing a book might use classes completely differently! Thus requiring a different set of Regex.

There is no one stop shop automated solution to try to bash a mishmash of code into YOUR PERSONAL PREFERENCES (see footnote code above). All that you can hope for is that the original producer has taken some steps initially to make the code clean/understandable/consistent. This will make your job of cleaning much easier!

The only generic thing that I can think of that can work for almost all EPUBs are some extremely basic Regex to do things like changing hyphen between years -> en dashes, searching for missing quotation marks, searching for broken paragraphs.... but even this requires a lot of human assistance, and no way to fully automate.

Quote:
Originally Posted by Jellby View Post
You mean this?

https://wiki.mobileread.com/wiki/EPub_Reader_Test
(just search "epub test" in the wiki)
That is exactly it!
Tex2002ans is offline   Reply With Quote