View Single Post
Old 04-03-2011, 03:41 PM   #49
kiwidude
Calibre Plugins Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,637
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
v1.2 Released

Firstly, thanks to drMerry for the suggestions and testing in this thread. It has become obvious from several of you that the original regex used in this plugin was extremely conservative. For this release I have used a variant of what drMerry proposed (no longer looking for textual prefixes like ISBN) which significantly increases the match rate.

I have also replaced the PDF processing to something that is many orders of magnitude faster, by only scanning the first 10 and last 5 pages of a PDF.

Changes in v1.2:
  • Rewritten for new plugin infrastructure in Calibre 0.7.53
  • ISBN matching regex replaced
  • PDFs now processed with new Calibre PDF engine to scan just first 10 and last 5 pages

See the attached text document for my test cases. Note that this release still makes no attempts to catch bad OCR scans (e.g. O instead of 0, I instead of 1 etc). It also will not match numbers split across multiple lines, or text underneath graphics. I have also not as yet optimised scanning non PDF formats.

It should however run significantly faster for PDFs and give you more matches than previously.
Attached Files
File Type: txt TestISBN.txt (658 Bytes, 705 views)
kiwidude is offline   Reply With Quote