Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Plugins

Notices

Reply
 
Thread Tools Search this Thread
Old 07-06-2011, 01:12 PM   #181
drMerry
Addict
drMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmos
 
drMerry's Avatar
 
Posts: 293
Karma: 21022
Join Date: Mar 2011
Location: NL
Device: Sony PRS-650
I got a book where no ISBN was found.
Copy past the ISBN from the pdf to calibre was no problem.
Spoiler:
Starting job: Extract ISBN for 1 books
Running scan for isbn query with parameters:
{u'paths': [(u'PDF', u'C:\\local (laptop)\\Onbekend\\Great Book of Puzzles (18786)\\Great Book of Puzzles - Onbekend.pdf')], u'timeout': 30, u'title': u'Great Book of Puzzles'}
-------------------------------
Scanning: C:\local (laptop)\Onbekend\Great Book of Puzzles (18786)\Great Book of Puzzles - Onbekend.pdf
Scan time: 23.503000021 Great Book of Puzzles
The scan failed to find an isbn in 23.50 seconds
Failed to extract ISBN for Great Book of Puzzles
Scan complete, with 1 failures


I'll send the book by pm
drMerry is offline   Reply With Quote
Old 07-06-2011, 02:09 PM   #182
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,601
Karma: 2092290
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
@drMerry - my guess from looking at the PDF is the text is behind an image. The PDF conversion engine never picks up the text in that situation, so there is no ISBN to find. I would guess that if you tried to convert that PDF to an EPUB you would find that page was rendered as an image in the EPUB.
kiwidude is offline   Reply With Quote
Old 07-06-2011, 05:55 PM   #183
drMerry
Addict
drMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmos
 
drMerry's Avatar
 
Posts: 293
Karma: 21022
Join Date: Mar 2011
Location: NL
Device: Sony PRS-650
You're right.
Stupid I did not think of that.
Thanks for looking.
drMerry is offline   Reply With Quote
Old 08-08-2011, 04:41 PM   #184
mobilemax
Member
mobilemax is on a distinguished road
 
Posts: 13
Karma: 68
Join Date: Aug 2011
Device: Kindle
timeout option?

Any chance to add something like "timeout" option to the script? I have had some books where the script just stayed working for hours and it never finished. Would it be possible to say stop the task on current book after a specified time? e.g. 5 minutes maximum?

thanks!
mobilemax is offline   Reply With Quote
Old 08-08-2011, 04:46 PM   #185
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,601
Karma: 2092290
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
@mobilemax - The problem will not be the time taken to scan, but the time taken to convert to epub (which is calibre code) prior to the scan. You must have a particularly nasty book that Calibre is choking on. As for whether it would be possible to force a timeout, I don't know - I will add it to the list to take a look at one day.
kiwidude is offline   Reply With Quote
Old 08-08-2011, 04:53 PM   #186
mobilemax
Member
mobilemax is on a distinguished road
 
Posts: 13
Karma: 68
Join Date: Aug 2011
Device: Kindle
Quote:
Originally Posted by kiwidude View Post
@mobilemax - The problem will not be the time taken to scan, but the time taken to convert to epub (which is calibre code) prior to the scan. You must have a particularly nasty book that Calibre is choking on. As for whether it would be possible to force a timeout, I don't know - I will add it to the list to take a look at one day.
Yep, had quite a few and since i decided to run the whole db through ExtractISBN, it's quite boring to find that it just did not finish "these and those 500 books" and you have to find the bad ones and skip them ;-)

But I still love the script of course! ;-)

Thanks

Btw, is there any way of limiting which formats it will parse? E.g. I have .txt/.epub with the same contents because .epub was created from .txt and it would make sense to skip the .txt to make it quicker...

Last edited by mobilemax; 08-08-2011 at 05:02 PM.
mobilemax is offline   Reply With Quote
Old 08-08-2011, 05:21 PM   #187
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,601
Karma: 2092290
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
No way to limit it, nor would many people want to (since unless you do all your own conversions you wouldnt know they were the same exact content. BeI would expect to be pretty quick anyways. It is formats like LRF and graphical PDFs that Calibre chokes on the most.
kiwidude is offline   Reply With Quote
Old 08-08-2011, 05:31 PM   #188
jlutes
Connoisseur
jlutes began at the beginning.
 
Posts: 52
Karma: 12
Join Date: Jul 2011
Device: none
I do find this script useful but it seems to fail on a pretty regular basis. Perhaps it's a problem on my end so let's start with that.
I routinely get a Windows exception error stating:
AppName: calibre-parallel.exe AppVer: 0.8.13.0 ModName: unknown
ModVer: 0.0.0.0 Offset: 025b80b5
Once I see that message I know I'm done and I might as well kill the job. I have let it sit for over an hour and it never will finish. The real kicker is, unless I'm missing something, the ISBNs it did find aren't applied if you have to kill the job. Scanning 500 books and finding out it crashed at 98% just makes my skin crawl.
My question is, what, if anything, can I provide to help find and squash whatever is causing this?
jlutes is offline   Reply With Quote
Old 08-08-2011, 06:08 PM   #189
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,601
Karma: 2092290
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
@jlutes - you need to figure out which book and format is causing your issue. My guess would be that is a problem with a PDF since that sort of crash is likely from C++ unmanaged code (which the PDF conversions use). If you find the book causing the crash, attach it with a bug report for Kovid to take a look at. There is nothing I or the plugin can do about this, it is calling existing Calibre code.
kiwidude is offline   Reply With Quote
Old 08-08-2011, 08:03 PM   #190
jlutes
Connoisseur
jlutes began at the beginning.
 
Posts: 52
Karma: 12
Join Date: Jul 2011
Device: none
I went to try and figure out if a certain format was causing the problem and found an even more interesting phenomenon. If I highlight a group of 10 books and run Extract ISBN, I get the error I described earlier. However, I can choose each book individually and run Extract ISBN on each one and it never errors. Are we looking at the same thing? Still a call to existing Calibre code causing the problem?

* Update *
I got a virtual memory warning on my machine (first I've ever seen) and found that there were about 50 Dr. Watson process running and each one of them was tied to a calibreparallel process. After I killed all of them and restarted Calibre it appears that it's attitude has changed. I am still getting Windows Exception errors but they aren't stopping the process.

Last edited by jlutes; 08-08-2011 at 11:29 PM.
jlutes is offline   Reply With Quote
Old 08-09-2011, 12:31 AM   #191
capnm
Groupie
capnm knows the difference between 'who' and 'whom'capnm knows the difference between 'who' and 'whom'capnm knows the difference between 'who' and 'whom'capnm knows the difference between 'who' and 'whom'capnm knows the difference between 'who' and 'whom'capnm knows the difference between 'who' and 'whom'capnm knows the difference between 'who' and 'whom'capnm knows the difference between 'who' and 'whom'capnm knows the difference between 'who' and 'whom'capnm knows the difference between 'who' and 'whom'capnm knows the difference between 'who' and 'whom'
 
Posts: 156
Karma: 10001
Join Date: Feb 2011
Device: sony
Hmmm...
IIRC, running this against just a couple of books it is run as part of the main Calibre process, but select several books (user configurable threshold) and it spawns a background worker process.

Oddly, I had several issues with memory leaks while running as part of the main process, but the spawned background jobs have always been well behaved on my machines. But I'm almost all epub & mobi files.

I wonder what would happen if you raised the threshold in the plugin configuration and tried that same group of 10 as a foreground process instead of as a background process ....

And are your books pdfs? Or ....?
capnm is offline   Reply With Quote
Old 08-09-2011, 02:54 AM   #192
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,601
Karma: 2092290
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
If you read back through this thread you would understand why it is different behaviour between running one versus multiple. Calibre has a major issue with memory leaks in the conversion process, so to work around this conversions should be done in the background. However if you are just doing a single ad hoc extract ISBN (which is how I usually tend to work) then for speed reasons I don't run it as a background job if you select only a single book.

It sounds like Calibre is crashing on your books when doing the extract when running in the background. No-one else has reported any issues with this, so I am inclined to believe it is something about the books you are scanning. You need to figure out the format that is causing the issue - duplicate the books (create empty books then merge in keeping the original), then one by one remove likely problem formats (starting with PDF) to see if it still errors.
kiwidude is offline   Reply With Quote
Old 08-13-2011, 06:35 AM   #193
Ababakar
Member
Ababakar began at the beginning.
 
Posts: 23
Karma: 10
Join Date: Aug 2011
Device: none
as i´m currently starting using calibre and am in the process of editing all addes books i encountered some things regarding isbn extraction

First one:
Tricky: i got some books out of an edition with some other volumes. These are mentioned on page 2 (with their isbn) - the isbn of the actual book was on page 3. Don´t know if there is a solution for that (maybe a hint if more than one isbn is found). But anyway: Don´t trust the extraction blindly
Not to say that i don´t like your work kiwi - just to remind that things are never perfect
edit: just read the whle thread: this is kind of the same as mentioned in post #63 and #74 - so i guess this is alrady discussed. Just wanted to mention it.

third one:
it took me some hours to figure out such a nice search options as isbn:false - so maybe you should place this in the faqs of the plugin or something. But as i know it now it doesn´t bother me anymore

last one:
great plugin.

only thing left for me as a new user is try to find a way to easily add my comic collection (cbr+cbz). but that´s another topic.
Second one:
in some of my books there is no word as "ISBN" ü following number but the whole thing ("International Standard Book Number" + following number). Maybe this term can be included in later versions.
Ababakar is offline   Reply With Quote
Old 08-13-2011, 06:47 AM   #194
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,601
Karma: 2092290
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
@Ababakar - welcome to MobileRead.

Yes as you will have read repeated through this thread there is no magic bullet for grabbing the ISBNs, and there will always be the odd situation where it either cannot find it or gets the wrong one if there are multiple. However these are the exception rather than the norm.

As for your cbr/cbz files - this plugin does not look for the word "ISBN" - the very first implementation did look for preceding words, however due to so many variations (and to cater for bad quality OCR scan errors) it now just looks for a sequence of numbers that start with the right prefix for ISBNs and validate as an ISBN. If it cannot find such in your comic books my guess is that they are images rather than text, which the plugin cannot scan. You can see from looking at the log as to what it text numbers it did attempt to match on, and you can always do a conversion to ePub to verify for yourself what "text" the plugin found available to scan (since for all but PDFs that is exactly what the plugin is doing in the background - silently doing a conversion to ePub and then scanning the html pages for text). If your comic shows up as an EPUB containing image files where the ISBN is then that proves the plugin will be unable to extract it.
kiwidude is offline   Reply With Quote
Old 08-13-2011, 10:13 AM   #195
Ababakar
Member
Ababakar began at the beginning.
 
Posts: 23
Karma: 10
Join Date: Aug 2011
Device: none
oh sorry - i didn´t notice, that it no longer catches for phrases - just for the digits.
Anyway - as i am sorting my collection: i got some books where the isbn could not be extracted but can easily be found via okular (kde pdf viewer - so it is ocr´d and not only an image). + they are on the first 10 pages.
by the way - i also got a lot of ocr´d djvu´s where they could not be extracted but found via strg+f in my pdf-viewer. Did i read right that djvus won´t work at all?
Anyway: If you want i can collect those pdfs (and djvus) for you (i will simply print out the single page where i find the isbn with cups (linux pdf printer) to keep the data size small). But as i am doing like "5 books a day" this may take a while.

as for cbr/cbz - i know - this was more a general comment than regarding isbn extraction. Only wanted to tell that i am not jet sure if calibre can help me with those ones. But as said - i may discuss this in another topic.

Last edited by Ababakar; 08-13-2011 at 10:15 AM.
Ababakar is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Extract ISBN from PDF? mdroberts Calibre 14 12-16-2016 07:32 AM
[Old Thread] Extract ISBN from file name ChristianQ Calibre 59 12-09-2015 05:08 AM
[GUI Plugin] Plugin Updater **Deprecated** kiwidude Plugins 159 06-19-2011 12:27 PM
[Old Thread] Auto Extract ISBN-Feature request UnraisedArc Calibre 60 03-23-2011 09:31 AM
Displaying ISBN column in the main GUI tilleydog Library Management 26 02-25-2011 04:08 AM


All times are GMT -4. The time now is 07:46 AM.


MobileRead.com is a privately owned, operated and funded community.