07-06-2011, 01:12 PM | #181 |
Addict
Posts: 293
Karma: 21022
Join Date: Mar 2011
Location: NL
Device: Sony PRS-650
|
I got a book where no ISBN was found.
Copy past the ISBN from the pdf to calibre was no problem. Spoiler:
I'll send the book by pm |
07-06-2011, 02:09 PM | #182 |
Calibre Plugins Developer
Posts: 4,637
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
@drMerry - my guess from looking at the PDF is the text is behind an image. The PDF conversion engine never picks up the text in that situation, so there is no ISBN to find. I would guess that if you tried to convert that PDF to an EPUB you would find that page was rendered as an image in the EPUB.
|
Advert | |
|
07-06-2011, 05:55 PM | #183 |
Addict
Posts: 293
Karma: 21022
Join Date: Mar 2011
Location: NL
Device: Sony PRS-650
|
You're right.
Stupid I did not think of that. Thanks for looking. |
08-08-2011, 04:41 PM | #184 |
Member
Posts: 13
Karma: 68
Join Date: Aug 2011
Device: Kindle
|
timeout option?
Any chance to add something like "timeout" option to the script? I have had some books where the script just stayed working for hours and it never finished. Would it be possible to say stop the task on current book after a specified time? e.g. 5 minutes maximum?
thanks! |
08-08-2011, 04:46 PM | #185 |
Calibre Plugins Developer
Posts: 4,637
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
@mobilemax - The problem will not be the time taken to scan, but the time taken to convert to epub (which is calibre code) prior to the scan. You must have a particularly nasty book that Calibre is choking on. As for whether it would be possible to force a timeout, I don't know - I will add it to the list to take a look at one day.
|
Advert | |
|
08-08-2011, 04:53 PM | #186 | |
Member
Posts: 13
Karma: 68
Join Date: Aug 2011
Device: Kindle
|
Quote:
But I still love the script of course! ;-) Thanks Btw, is there any way of limiting which formats it will parse? E.g. I have .txt/.epub with the same contents because .epub was created from .txt and it would make sense to skip the .txt to make it quicker... Last edited by mobilemax; 08-08-2011 at 05:02 PM. |
|
08-08-2011, 05:21 PM | #187 |
Calibre Plugins Developer
Posts: 4,637
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
No way to limit it, nor would many people want to (since unless you do all your own conversions you wouldnt know they were the same exact content. BeI would expect to be pretty quick anyways. It is formats like LRF and graphical PDFs that Calibre chokes on the most.
|
08-08-2011, 05:31 PM | #188 |
Connoisseur
Posts: 52
Karma: 12
Join Date: Jul 2011
Device: none
|
I do find this script useful but it seems to fail on a pretty regular basis. Perhaps it's a problem on my end so let's start with that.
I routinely get a Windows exception error stating: AppName: calibre-parallel.exe AppVer: 0.8.13.0 ModName: unknown ModVer: 0.0.0.0 Offset: 025b80b5 Once I see that message I know I'm done and I might as well kill the job. I have let it sit for over an hour and it never will finish. The real kicker is, unless I'm missing something, the ISBNs it did find aren't applied if you have to kill the job. Scanning 500 books and finding out it crashed at 98% just makes my skin crawl. My question is, what, if anything, can I provide to help find and squash whatever is causing this? |
08-08-2011, 06:08 PM | #189 |
Calibre Plugins Developer
Posts: 4,637
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
@jlutes - you need to figure out which book and format is causing your issue. My guess would be that is a problem with a PDF since that sort of crash is likely from C++ unmanaged code (which the PDF conversions use). If you find the book causing the crash, attach it with a bug report for Kovid to take a look at. There is nothing I or the plugin can do about this, it is calling existing Calibre code.
|
08-08-2011, 08:03 PM | #190 |
Connoisseur
Posts: 52
Karma: 12
Join Date: Jul 2011
Device: none
|
I went to try and figure out if a certain format was causing the problem and found an even more interesting phenomenon. If I highlight a group of 10 books and run Extract ISBN, I get the error I described earlier. However, I can choose each book individually and run Extract ISBN on each one and it never errors. Are we looking at the same thing? Still a call to existing Calibre code causing the problem?
* Update * I got a virtual memory warning on my machine (first I've ever seen) and found that there were about 50 Dr. Watson process running and each one of them was tied to a calibreparallel process. After I killed all of them and restarted Calibre it appears that it's attitude has changed. I am still getting Windows Exception errors but they aren't stopping the process. Last edited by jlutes; 08-08-2011 at 11:29 PM. |
08-09-2011, 12:31 AM | #191 |
Groupie
Posts: 156
Karma: 10001
Join Date: Feb 2011
Device: sony
|
Hmmm...
IIRC, running this against just a couple of books it is run as part of the main Calibre process, but select several books (user configurable threshold) and it spawns a background worker process. Oddly, I had several issues with memory leaks while running as part of the main process, but the spawned background jobs have always been well behaved on my machines. But I'm almost all epub & mobi files. I wonder what would happen if you raised the threshold in the plugin configuration and tried that same group of 10 as a foreground process instead of as a background process .... And are your books pdfs? Or ....? |
08-09-2011, 02:54 AM | #192 |
Calibre Plugins Developer
Posts: 4,637
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
If you read back through this thread you would understand why it is different behaviour between running one versus multiple. Calibre has a major issue with memory leaks in the conversion process, so to work around this conversions should be done in the background. However if you are just doing a single ad hoc extract ISBN (which is how I usually tend to work) then for speed reasons I don't run it as a background job if you select only a single book.
It sounds like Calibre is crashing on your books when doing the extract when running in the background. No-one else has reported any issues with this, so I am inclined to believe it is something about the books you are scanning. You need to figure out the format that is causing the issue - duplicate the books (create empty books then merge in keeping the original), then one by one remove likely problem formats (starting with PDF) to see if it still errors. |
08-13-2011, 06:35 AM | #193 |
Member
Posts: 23
Karma: 10
Join Date: Aug 2011
Device: none
|
as i´m currently starting using calibre and am in the process of editing all addes books i encountered some things regarding isbn extraction
First one: Tricky: i got some books out of an edition with some other volumes. These are mentioned on page 2 (with their isbn) - the isbn of the actual book was on page 3. Don´t know if there is a solution for that (maybe a hint if more than one isbn is found). But anyway: Don´t trust the extraction blindly Not to say that i don´t like your work kiwi - just to remind that things are never perfect edit: just read the whle thread: this is kind of the same as mentioned in post #63 and #74 - so i guess this is alrady discussed. Just wanted to mention it. third one: it took me some hours to figure out such a nice search options as isbn:false - so maybe you should place this in the faqs of the plugin or something. But as i know it now it doesn´t bother me anymore last one: great plugin. only thing left for me as a new user is try to find a way to easily add my comic collection (cbr+cbz). but that´s another topic. Second one: in some of my books there is no word as "ISBN" ü following number but the whole thing ("International Standard Book Number" + following number). Maybe this term can be included in later versions. |
08-13-2011, 06:47 AM | #194 |
Calibre Plugins Developer
Posts: 4,637
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
@Ababakar - welcome to MobileRead.
Yes as you will have read repeated through this thread there is no magic bullet for grabbing the ISBNs, and there will always be the odd situation where it either cannot find it or gets the wrong one if there are multiple. However these are the exception rather than the norm. As for your cbr/cbz files - this plugin does not look for the word "ISBN" - the very first implementation did look for preceding words, however due to so many variations (and to cater for bad quality OCR scan errors) it now just looks for a sequence of numbers that start with the right prefix for ISBNs and validate as an ISBN. If it cannot find such in your comic books my guess is that they are images rather than text, which the plugin cannot scan. You can see from looking at the log as to what it text numbers it did attempt to match on, and you can always do a conversion to ePub to verify for yourself what "text" the plugin found available to scan (since for all but PDFs that is exactly what the plugin is doing in the background - silently doing a conversion to ePub and then scanning the html pages for text). If your comic shows up as an EPUB containing image files where the ISBN is then that proves the plugin will be unable to extract it. |
08-13-2011, 10:13 AM | #195 |
Member
Posts: 23
Karma: 10
Join Date: Aug 2011
Device: none
|
oh sorry - i didn´t notice, that it no longer catches for phrases - just for the digits.
Anyway - as i am sorting my collection: i got some books where the isbn could not be extracted but can easily be found via okular (kde pdf viewer - so it is ocr´d and not only an image). + they are on the first 10 pages. by the way - i also got a lot of ocr´d djvu´s where they could not be extracted but found via strg+f in my pdf-viewer. Did i read right that djvus won´t work at all? Anyway: If you want i can collect those pdfs (and djvus) for you (i will simply print out the single page where i find the isbn with cups (linux pdf printer) to keep the data size small). But as i am doing like "5 books a day" this may take a while. as for cbr/cbz - i know - this was more a general comment than regarding isbn extraction. Only wanted to tell that i am not jet sure if calibre can help me with those ones. But as said - i may discuss this in another topic. Last edited by Ababakar; 08-13-2011 at 10:15 AM. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Extract ISBN from PDF? | mdroberts | Calibre | 14 | 12-16-2016 07:32 AM |
[Old Thread] Extract ISBN from file name | ChristianQ | Calibre | 59 | 12-09-2015 05:08 AM |
[GUI Plugin] Plugin Updater **Deprecated** | kiwidude | Plugins | 159 | 06-19-2011 12:27 PM |
[Old Thread] Auto Extract ISBN-Feature request | UnraisedArc | Calibre | 60 | 03-23-2011 09:31 AM |
Displaying ISBN column in the main GUI | tilleydog | Library Management | 26 | 02-25-2011 04:08 AM |