MobileRead Forums - View Single Post

cybmole · 10-10-2010, 10:50 AM

Quote:

Originally Posted by ldolse

Yes, read up here:
http://calibre-ebook.com/user_manual/regexp.html

ok -

a little play with the wizard indicates that this will work for title
<p class="calibre1">[0-9]+ Mexico</p>

but I'm not sure how to generalise it so that it also takes out the variable phrases which form chapter names ?
I need something that takes out stuff like
<p class="calibre1">The Cactus and the Maguey 11</p> where the text can be any phrase fragment which is followed by a number ?

still, it's a start, thanks.
inspecting the .mobi output, the above regex has done it's job, but I also now see that the text is littered with OCR scan errors so various corruptions of Mexico still sneak through.
I read elsewhere where that there are no good sources for this & for several other books by James Michener, & no kindle versions on sales either -