Quote:
Originally Posted by ldolse
|
ok -
a little play with the wizard indicates that this will work for title
<p class="calibre1">[0-9]+ Mexico</p>
but I'm not sure how to generalise it so that it also takes out the variable phrases which form chapter names ?
I need something that takes out stuff like
<p class="calibre1">The Cactus and the Maguey 11</p> where the text can be any phrase fragment which is followed by a number ?
still, it's a start, thanks.
inspecting the .mobi output, the above regex has done it's job, but I also now see that the text is littered with OCR scan errors so various corruptions of Mexico still sneak through.
I read elsewhere where that there are no good sources for this & for several other books by James Michener, & no kindle versions on sales either -