View Single Post
Old 10-10-2010, 01:54 PM   #6
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
that has been most helpful - thanks to all. I have learned a little regex & also leaned that my source is not that great.
here's what someone on another forum has to say about the michener sources: ( maybe I'll have to go get paper copies for some of these! )

Centennial' and 'Chesapeake' are very good and seem to have been professional lits.
'The Novel' is all one paragraph...
'Hawaii' seems to have been written completely in italics and is littered with page numbers. There seems to be something wrong with the lit file as its html is missing important elements. Calibre is unable to read the pdb version.
'The Bridges at Toko-Ri' is reasonably good.
'Space' is the result of an automated conversion that didn't really work.
'Recessional' has the odd error and has lost its structure, but is otherwise readable.
'Poland' is another automated conversion that hasn't been cleaned-up.
'The Covenant', 'Legacy', 'The Source' and 'Mexico' appear to be the raw output of OCR scans and are full of errors. Someone with the original text to hand would need to do a lot of work on these before they were remotely readable.....
cybmole is offline   Reply With Quote