MobileRead Forums - View Single Post

kiwidude · 03-23-2011, 03:32 PM

@Talonius - I will push a 1.0.1 version shortly which will ensure any errors are more gracefully handled. It will also display progress in the status bar.

The optimization stuff is a tough one. The problem is that I have seen books where the copyright/ISBN information has been put at the end of the EPUB. Granted this is the exception rather than the rule, but maybe others have seen it frequently? This is the sort of operation that you will only do once on your books though so performance shouldn't be too much of an issue...

Also, I think most of the slowdown will be in the time taken to convert each book into text, not the bit the plugin does of applying regex expressions on each file in it. I haven't profiled it but I am pretty confident that will be the case.

What I have done is get it to short-circuit gathering ISBNs once it has found an ISBN and finished processing the current internal file of the converted format. The logic I "borrowed" from bazbar scanned the whole book and built up lists of ISBNs should a book have multiple ISBN13s for instance. I don't know enough about when that ever happens (most books I have seen have only either one or both of an ISBN10/ISBN13 but not more than that). Finishing processing a file (hopefully all ISBNs are on the same one) and then stopping should be enough. This won't help speed up books with no ISBN inside though.

I am also about to make it that if you ctrl+click or shift+click on the toolbar button it will do a non-interactive decision of which format to interrogate when you have multiple. This will be based on your preferred input format list in Preferences for now. I'll wait for suggestions for alternatives before doing anything else around that. For people who only have formats produced by converting the same version that will work well. Where it won't is say if they got a PDF from somewhere and an EPUB from somewhere else, and the EPUB has had the ISBN stuff removed. Still, at least you will see in the report which books it failed to find an ISBN for, and you can always then just do a normal toolbar button click to get the interactive choice of format to extract from.

03-23-2011, 03:32 PM	#6
kiwidude Calibre Plugins Developer Posts: 4,636 Karma: 2162064 Join Date: Oct 2010 Location: Australia Device: Kindle Oasis	@Talonius - I will push a 1.0.1 version shortly which will ensure any errors are more gracefully handled. It will also display progress in the status bar. The optimization stuff is a tough one. The problem is that I have seen books where the copyright/ISBN information has been put at the end of the EPUB. Granted this is the exception rather than the rule, but maybe others have seen it frequently? This is the sort of operation that you will only do once on your books though so performance shouldn't be too much of an issue... Also, I think most of the slowdown will be in the time taken to convert each book into text, not the bit the plugin does of applying regex expressions on each file in it. I haven't profiled it but I am pretty confident that will be the case. What I have done is get it to short-circuit gathering ISBNs once it has found an ISBN and finished processing the current internal file of the converted format. The logic I "borrowed" from bazbar scanned the whole book and built up lists of ISBNs should a book have multiple ISBN13s for instance. I don't know enough about when that ever happens (most books I have seen have only either one or both of an ISBN10/ISBN13 but not more than that). Finishing processing a file (hopefully all ISBNs are on the same one) and then stopping should be enough. This won't help speed up books with no ISBN inside though. I am also about to make it that if you ctrl+click or shift+click on the toolbar button it will do a non-interactive decision of which format to interrogate when you have multiple. This will be based on your preferred input format list in Preferences for now. I'll wait for suggestions for alternatives before doing anything else around that. For people who only have formats produced by converting the same version that will work well. Where it won't is say if they got a PDF from somewhere and an EPUB from somewhere else, and the EPUB has had the ISBN stuff removed. Still, at least you will see in the report which books it failed to find an ISBN for, and you can always then just do a normal toolbar button click to get the interactive choice of format to extract from. Last edited by kiwidude; 03-23-2011 at 04:06 PM. Reason: Added more info about performance bottleneck