MobileRead Forums - View Single Post

mmat1 · 04-02-2012, 03:52 PM

Quote:

Originally Posted by sebito

OMG

You make it sound easy ...

I am familiar with sigil, but do not see the functionality at this time. I and I have tried my epub, in fact both you send HTML, are part of the full epub. Are you used regular expressions after the sigil? Do you applied to HTML?

OK, thats in general the strategy
1. I noticed that none of the href-values has a filename, that must be corrected fist. So i merged the 2 files and added a "../Text/015.html" to any "#a\d+?".

2. I split the two files and Sigil corrects the filenames automatically. Some of your links are pointing to an anchor within the same file. Only links which now point to notes.html will be threated in the next steps.

3. I added a "id"-attribute with the same number as the href to any link, which points to "notes.html", preceeding with "t" (within 015.html only).

4. Due to the weird formatting it get's a bit tougher in notes.html. First i replaced "<span class="tpublidisa70"> </span>" with " " since i see no point to give a blank a special format and it will make the following regex easier.

5. Regex (in notes.html only)

Code:

Find: <a id="a(\d\d?\d?\d?\d?)(">)</a>(&nbsp;<span class="tpublidisa71">)<a href="../Text/Text.html#a65">(.+?)</a></span>
Replace: <a href="../Text/Text.html#t\1" id="a\1\2\3\4</span></a>

This uses your "<a href="../Text/Text.html#a65">" as endpoint (well in most cases it's just "<a href="">" and tosses it out for good.

done

----------------------------------------------------

Edit: There's no special functiony within Sigil. It's just dividing the job into small steps and usage of regex. It is easy, with a few hundred links. I guess it's still a tedious job with 181000...