Typical ePub XHTML has an extremly poor quality.
Inline-Styles.
DIVs.
SPANs.
Redundant class or id attributes.
Lack of semantic markup.
...
In other words: an horror for a friends of high quality semantic html.
The quality of the css files is not better.
The consequent use of elegant selectors? Negative report.
The motive of the authors of the css to present a clear and lean set of rules? Negativ report.
Therefore:
you have to help yourself.
Of course there are many ways/tools for users to clean HTML and CSS manually.
But that needs to much time.
One important goal for the editing is:
Not to loose important markup with radical automatic cleaning.
Typical information type, you want to keep ist:
"This a header"
"This is emphasized text"
"This is an unorded list"
Therefore you need a tool which allows an easy analysis of the markup.
The tool should list all used elements with attributes and all elements without.
Example:
div
div class="foo"
span id="01zot"
span id="02zot"
p class="text-indent"
h1 id="01bar"
span style="italic"
samp
Then the tool should offer actions for a highlight group of entries:
Example:
1
Convert
div class="foo" into into
p
2
Delete all attributes
3
Convert all
span style="italic" into
em
4
Convert all
H3 into
H4
5
Delete
samp (but keep it's content)
...
Of course this ist just a sketch to explain what I like to reach.
Which tools for highly efficiently cleaning epubs manually do you use, can you advise?
Which generic actions to you assign (e.g. via scripts or plugins) automatically - before or after manual cleaning actions?
My goal is just a lean, semantic, beautiful xhtml like that:
Code:
<h1>Lorem ipsum</h1>
<p>Dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et <em> dolore</em> magna aliqua.</p>
<p><img src="/images/01.jpg" \></p>
<p>Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.</p>
And I don't like to spent more than - let's say 3 minutes - for a single book
Thanks.