View Single Post
Old 04-03-2011, 04:41 PM   #49
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,230
Karma: 1345754
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
v1.2 Released

Firstly, thanks to drMerry for the suggestions and testing in this thread. It has become obvious from several of you that the original regex used in this plugin was extremely conservative. For this release I have used a variant of what drMerry proposed (no longer looking for textual prefixes like ISBN) which significantly increases the match rate.

I have also replaced the PDF processing to something that is many orders of magnitude faster, by only scanning the first 10 and last 5 pages of a PDF.

Changes in v1.2:
  • Rewritten for new plugin infrastructure in Calibre 0.7.53
  • ISBN matching regex replaced
  • PDFs now processed with new Calibre PDF engine to scan just first 10 and last 5 pages

See the attached text document for my test cases. Note that this release still makes no attempts to catch bad OCR scans (e.g. O instead of 0, I instead of 1 etc). It also will not match numbers split across multiple lines, or text underneath graphics. I have also not as yet optimised scanning non PDF formats.

It should however run significantly faster for PDFs and give you more matches than previously.
Attached Files
File Type: txt TestISBN.txt (658 Bytes, 145 views)
kiwidude is offline   Reply With Quote