View Single Post
Old 01-28-2011, 06:51 AM   #8
duepixel
Junior Member
duepixel began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Jan 2011
Device: kindle 3
Quote:
Originally Posted by Manichean View Post
Convert to ePub and use Sigil.
Do not agree.
For me, more flexible management of S&R is necessary in calibre.

There are some situations where the conversion from pdf-> html-> epub lose formatting.
example:

I have a text like this (converted by calibre pdftohtml engine):
__________________________
Code:
a bit 'of sticky stuff. I spent the index and I approached him on the nose. <br>
<hr>
<A name=39> </ a> tomato sauce. <br>
calibre epub conversion:
__________________________
Code:
<p class="calibre2"> bit 'of sticky stuff. I Spent the index and I approached HIM on the nose </ p>
<p class="calibre2"/>
<p class="calibre2"> tomato sauce. </ p>
in this case (when load epub html in sigil) I do not know if the break line is desired or wrong interpretation of <hr>

with S&R I can create a regex like this:
<br> \ s <hr> \ s <A name=\d+> </ a>
and replace wiht nothing.

Another example is un-wrapping:
Code:
The hottest summer of the century.<br>
Four homes lost in the corn. The major are plug-<br>
ged into the house. Six children on their bicycles<br>
epub:
Code:
<p class="calibre2">The hottest summer of the century.
Four homes lost in the corn. The major are plug-ged into the house. Six children on their bicycles</p>
I can't remove the character "-" in sigil because it can be used successfully in other circumstances (eg: mercedes-benz)...

with S&R i can create a regex:
([^\s]\-<br>)|([^\s]\-<br>\s*)
and replace with null string.

it's wrong?
duepixel is offline   Reply With Quote