View Single Post
Old 08-07-2013, 05:43 AM   #1
ibu
Addict
ibu can eat soup with a fork.ibu can eat soup with a fork.ibu can eat soup with a fork.ibu can eat soup with a fork.ibu can eat soup with a fork.ibu can eat soup with a fork.ibu can eat soup with a fork.ibu can eat soup with a fork.ibu can eat soup with a fork.ibu can eat soup with a fork.ibu can eat soup with a fork.
 
Posts: 264
Karma: 9246
Join Date: Feb 2010
Location: Berlin, Germany
Device: Kobo H20, iPhone 6+, Macbook Pro
Cleaning ePubs: automatically, fast and with as many generic rules as possible

Typical ePub XHTML has an extremly poor quality.

Inline-Styles.
DIVs.
SPANs.
Redundant class or id attributes.
Lack of semantic markup.
...

In other words: an horror for a friends of high quality semantic html.

The quality of the css files is not better.

The consequent use of elegant selectors? Negative report.
The motive of the authors of the css to present a clear and lean set of rules? Negativ report.


Therefore:
you have to help yourself.

Of course there are many ways/tools for users to clean HTML and CSS manually.
But that needs to much time.

One important goal for the editing is:
Not to loose important markup with radical automatic cleaning.

Typical information type, you want to keep ist:
"This a header"
"This is emphasized text"
"This is an unorded list"

Therefore you need a tool which allows an easy analysis of the markup.

The tool should list all used elements with attributes and all elements without.

Example:

div
div class="foo"
span id="01zot"
span id="02zot"
p class="text-indent"
h1 id="01bar"
span style="italic"
samp

Then the tool should offer actions for a highlight group of entries:

Example:

1
Convert div class="foo" into into p

2
Delete all attributes

3
Convert all span style="italic" into em

4
Convert all H3 into H4

5
Delete samp (but keep it's content)

...


Of course this ist just a sketch to explain what I like to reach.

Which tools for highly efficiently cleaning epubs manually do you use, can you advise?

Which generic actions to you assign (e.g. via scripts or plugins) automatically - before or after manual cleaning actions?


My goal is just a lean, semantic, beautiful xhtml like that:

Code:
<h1>Lorem ipsum</h1>

<p>Dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et <em> dolore</em> magna aliqua.</p> 

<p><img src="/images/01.jpg" \></p>

<p>Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.</p>
And I don't like to spent more than - let's say 3 minutes - for a single book

Thanks.

Last edited by ibu; 08-07-2013 at 08:11 AM.
ibu is offline   Reply With Quote