View Single Post
Old 10-10-2010, 10:50 AM   #3
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
Quote:
Originally Posted by ldolse View Post
ok -

a little play with the wizard indicates that this will work for title
<p class="calibre1">[0-9]+ Mexico</p>

but I'm not sure how to generalise it so that it also takes out the variable phrases which form chapter names ?
I need something that takes out stuff like
<p class="calibre1">The Cactus and the Maguey 11</p> where the text can be any phrase fragment which is followed by a number ?

still, it's a start, thanks.
inspecting the .mobi output, the above regex has done it's job, but I also now see that the text is littered with OCR scan errors so various corruptions of Mexico still sneak through.
I read elsewhere where that there are no good sources for this & for several other books by James Michener, & no kindle versions on sales either -

Last edited by cybmole; 10-10-2010 at 11:10 AM.
cybmole is offline   Reply With Quote