Quote:
Originally Posted by mrevent
I'd like to have the main text of the book not interrupted with those numerous footnotes (which would be the case were I to attempt to convert the "printed" version of the page to epub) [...].
|
Looks like working from their "print view" would still be your best bet.
All the footnote text is there + the code is all clean.
All you'd have to do is use a few regexes to convert their code into the EPUB footnote HTML already discussed.
Quote:
Originally Posted by mrevent
|
In their "print view", each of the "pages" has this basic form:
- <hr> between pages
- <div> page number
- <p class="normal"> or <p class="noindent"> for basic paragraphs
- <p>[##] for footnote paragraphs
- <sup class="ref">[##]</sup> for footnote numbers.
Here's the relevant code for page 8:
Spoiler:
Code:
<hr class="pb"/>
<div align="center">
― <span class="run-head">8</span> ―
</div>
<p class="noindent">trade, however, the British exporters had to compete to a greater degree in noncolonial markets, where entry for British goods was no easier than for German ones.<sup class="ref">[20]</sup> [...]</p>
<p class="normal">The German and British wool textile industries resembled each other another way: in both countries, the majority of factories in this branch operated under the principal ownership of family partners.<sup class="ref">[21]</sup> [...]</p>
<p>[20] For comments on German competition in the American market for woolens and worsteds, see <i>Textile Manufacturer</i> , June 15, 1884, p. 244. [...]</p>
<p>[21] <i>Textile Mercury</i> , March 28, 1914, p. 253.</p>
So, what I'd do is 2 regexes:
Search: <sup class="ref">\[(\d+)\]</sup>
Replace: <a class="ref" href="#fn\1" id="ft\1">[\1]</a>
Search: <p>\[(\d+)\]
Replace: <p class="footnote"><a href="#ft\1" id="fn\1">[\1]</a>
That gets you all your EPUB clickable footnotes.
Now you'd just be left with the typical HTML cleanup:
- Removing page code (or converting to RPNs ["Real Page Numbers"]).
- Shifting all footnotes to the end-of-file.
- Merging split paragraphs.
- [...]