View Single Post
Old 07-25-2021, 05:30 PM   #67
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by mrevent View Post
I'd like to have the main text of the book not interrupted with those numerous footnotes (which would be the case were I to attempt to convert the "printed" version of the page to epub) [...].
Looks like working from their "print view" would still be your best bet.

All the footnote text is there + the code is all clean.

All you'd have to do is use a few regexes to convert their code into the EPUB footnote HTML already discussed.

Quote:
Originally Posted by mrevent View Post
The website in question is a great one: UC Press e-books collection, where 700 of them are accessible by the public.

The book in question is The Fabrication of Labor by R. Biernacki.
In their "print view", each of the "pages" has this basic form:
  • <hr> between pages
  • <div> page number
  • <p class="normal"> or <p class="noindent"> for basic paragraphs
  • <p>[##] for footnote paragraphs
  • <sup class="ref">[##]</sup> for footnote numbers.

Here's the relevant code for page 8:

Spoiler:
Code:
  <hr class="pb"/>

  <div align="center">
    ― <span class="run-head">8</span> ―
  </div>

  <p class="noindent">trade, however, the British exporters had to compete to a greater degree in noncolonial markets, where entry for British goods was no easier than for German ones.<sup class="ref">[20]</sup> [...]</p>

  <p class="normal">The German and British wool textile industries resembled each other another way: in both countries, the majority of factories in this branch operated under the principal ownership of family partners.<sup class="ref">[21]</sup> [...]</p>

  <p>[20] For comments on German competition in the American market for woolens and worsteds, see <i>Textile Manufacturer</i> , June 15, 1884, p. 244. [...]</p>

  <p>[21] <i>Textile Mercury</i> , March 28, 1914, p. 253.</p>


So, what I'd do is 2 regexes:

Search: <sup class="ref">\[(\d+)\]</sup>
Replace: <a class="ref" href="#fn\1" id="ft\1">[\1]</a>

Search: <p>\[(\d+)\]
Replace: <p class="footnote"><a href="#ft\1" id="fn\1">[\1]</a>

That gets you all your EPUB clickable footnotes.

Now you'd just be left with the typical HTML cleanup:
  • Removing page code (or converting to RPNs ["Real Page Numbers"]).
  • Shifting all footnotes to the end-of-file.
  • Merging split paragraphs.
  • [...]

Last edited by Tex2002ans; 07-25-2021 at 05:33 PM.
Tex2002ans is offline   Reply With Quote