View Single Post
Old Today, 11:21 AM   #9
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 31,560
Karma: 62543878
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by lomkiri View Post
I realized just now that this is probably what the OP was looking for.

Anyway, my regex-function gives another view on the error list, and in a book with lots of files and lots of proper names, it may indicates more quickly which files must be inspected. And it gives also the ability to inspect only one or few files instead of the whole epub.

Anyway, it was funny to do it ;-)

An example of the output:
Spoiler:

"Lower words before check" is False
3 files scanned, 1106 errors in it
==============================

file OEBPS/@public@vhost@g@gutenberg@html@files@844@844-h@844-h-2.htm.html: , 204 error(s), 5457 words
{'Lætitia', 'Grosvenor', 'Cardew', 'realised', 'Maxbohm', 'Markby', 'htm', 'tm', 'Melan', 'Migsby', 'Gower', 've', 'unenforceability', 'gutenberg', 'Bayswater', 'nonproprietary', 'Algy', 'eBook', 'Bracknell', 'MERCHANTIBILITY', 'Magley', 'Anabaptists', 'Mobbs', 'savour', 'Worthing', 'Mallam', 'EIN', 'Newby', 'dirs', 'F3', 'didn', 'unlink', 'couldn', 'EBOOK', 'gbnewby', 'www', 'eBooks', 'Leamington', 'pglaf', 'PGLAF', 'Dumbleton', 'Algernon', 'Gwendolen', 'Moncrieff', 'http', 'basinette'}

file OEBPS/@public@vhost@g@gutenberg@html@files@844@844-h@844-h-0.htm.html: , 474 error(s), 9544 words
{'wouldn', 'doesn', 'Bunbury', 'Grosvenor', 'Cardew', 'programme', 'Belgrave', 'Lætitia', 'theatre', 'natured', 'demeanour', 'Peile', 'shouldn', 've', 'Merriman', 'Hertfordshire', 'civilised', 'gutenberg', 'Algy', 'Shoreman', 'Bloxham', 'eBook', 'Bunburying', 'Bracknell', 'Didn', 'recognised', 'isn', 'Worthing', 'patronising', 'mustn', 'weren', 'Harbury', 'Canninge', 'didn', 'slightingly', 'hadn', 'shallying', 'realise', 'couldn', 'wasn', 'EBOOK', 'ccx074', 'neighbours', 'fibres', 'Mudie', 'Niel', 'Methuen', 'Fairfax', 'www', 'grey', 'pglaf', 'Couldn', 'Leclercq', 'Hallo', 'favour', 'debonnair', 'Vanbrugh', 'Marechal', 'Bunburyed', 'Tunbridge', 'Bunburyists', 'Aynesworth', 'colour', 'shilly', 'Farquhar', 'Woolton', 'Bunburyist', 'Algernon', 'Gwendolen', 'Moncrieff', 'THEATRE', 'Dyall', 'Egeria', 'demoralising'}

file OEBPS/@public@vhost@g@gutenberg@html@files@844@844-h@844-h-1.htm.html: , 428 error(s), 9260 words
{'Isn', 'wouldn', 'doesn', 'Bunbury', 'lorgnettte', 'Cardew', 'defence', 'scepticism', 'Belgrave', 'Messrs', 'Markby', 'neologistic', 'shouldn', 've', 'Merriman', 'Hertfordshire', 'Gervase', 'Algy', 'practised', 'aren', 'Bunburying', 'Fifeshire', 'Bracknell', 'Jouet', 'horticulturally', 'isn', 'Worthing', 'amongst', 'favourable', 'didn', 'usen', 'Dorking', 'hadn', 'realise', 'couldn', 'shan', 'pretence', 'Fairfax', 'candour', 'neighbourhood', 'honour', 'Couldn', 'draughts', 'favour', 'colour', 'marvellous', 'womanthrope', 'Bunburyist', 'Algernon', 'Gwendolen', 'Moncrieff'}

(I could have sorted the list on alphabetic order; well, it's easy to add it)
I would leave the list in the order FOUND. AKA just like it is.
theducks is offline   Reply With Quote