Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Plugins

Notices

Reply
 
Thread Tools Search this Thread
Old 04-03-2011, 04:10 AM   #46
Loeffel
Connoisseur
Loeffel began at the beginning.
 
Loeffel's Avatar
 
Posts: 58
Karma: 10
Join Date: Mar 2011
Device: Kindle 3 3G
Ok, it just needs that long and the computer says Calibre is not responding and the counter stays on 1.
I just saw that there are books that have an ISBN (like 3-442-04.273-9) but it wasn't found, but in the text are two numbers:

Ungekürzte Ausgabe • Made in Germany © 1973 by Sara Woods. Aus dem Englischen übertragen von Tony Wester-mayr. Alle Rechte, auch die der fotomechanischen Wiedergabe, vorbehalten. Jeder Nachdruck bedarf der Genehmigung des Verlages. Umschlag: Foto von Gilles Lagarde. Gesetzt aus der Linotype-Garamond-Antiqua.
Druck: Presse-Druck Augsburg. K 888/KR1MI 4273 • Sch.’Hu Gebundene Ausgabe ISBN 3-442-25.888-X
Taschenbuchausgabe ISBN 3-442-04.273-9

Is that the reason why the plugin doesn't come up with an ISBN number? I can search it but he always states there is none in the text.
Loeffel is offline   Reply With Quote
Old 04-03-2011, 04:31 AM   #47
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,224
Karma: 1334002
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
@Loeffel - the reason why it doesn't get that ISBN number is because it contains a mixture of dashes and periods. Previous posts have questioned whether people had seen this situation - obviously you have now found such a case (most likely given your pasted text due to it being a European edition of a book).

Your "counter stays on 1" comment - do you mean it only finds one ISBN on all your books? In which case yes this is due to the regex.

As I have posted several times I took the regex used by this plugin from someone else's extract ISBN script on the assumption that it was an evolution of many people's attempts over time before it. Clearly it was not as "proven" as we would like given these variations from yourself and drMerry. I'll take a fresh look at that part of it over the next few days, including what drMerry has been experimenting with.

What would be extremely useful is a list of test case ISBNs of variations people have seen - if people could please post these (either stick them in a text file attachment or just post directly in the thread, either will be fine). That way I can make sure the next implementation will cater for your examples.

Last edited by kiwidude; 04-03-2011 at 04:33 AM.
kiwidude is offline   Reply With Quote
 
Enthusiast
Old 04-03-2011, 11:04 AM   #48
theducks
Grand Sorcerer
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 14,530
Karma: 5567087
Join Date: Aug 2009
Location: The (original) Silicon Valley, USA
Device: Galaxy Tab 2, Astak Pocket Pro, K4NT
Quote:
Originally Posted by kiwidude View Post
@Loeffel - the reason why it doesn't get that ISBN number is because it contains a mixture of dashes and periods. Previous posts have questioned whether people had seen this situation - obviously you have now found such a case (most likely given your pasted text due to it being a European edition of a book).

Your "counter stays on 1" comment - do you mean it only finds one ISBN on all your books? In which case yes this is due to the regex.

As I have posted several times I took the regex used by this plugin from someone else's extract ISBN script on the assumption that it was an evolution of many people's attempts over time before it. Clearly it was not as "proven" as we would like given these variations from yourself and drMerry. I'll take a fresh look at that part of it over the next few days, including what drMerry has been experimenting with.

What would be extremely useful is a list of test case ISBNs of variations people have seen - if people could please post these (either stick them in a text file attachment or just post directly in the thread, either will be fine). That way I can make sure the next implementation will cater for your examples.
A Mixture of dashes and dots or spaces was not in the spec I read a long time ago. Any single method was permitted.

Language-Publisher-Book_number-Check_digit

where Language and Check_digit are single characters and the others add up to 8 characters.
Are they trying to sneak in the publishers 'Imprint' encoding into the Book_number?
theducks is online now   Reply With Quote
Old 04-03-2011, 03:41 PM   #49
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,224
Karma: 1334002
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
v1.2 Released

Firstly, thanks to drMerry for the suggestions and testing in this thread. It has become obvious from several of you that the original regex used in this plugin was extremely conservative. For this release I have used a variant of what drMerry proposed (no longer looking for textual prefixes like ISBN) which significantly increases the match rate.

I have also replaced the PDF processing to something that is many orders of magnitude faster, by only scanning the first 10 and last 5 pages of a PDF.

Changes in v1.2:
  • Rewritten for new plugin infrastructure in Calibre 0.7.53
  • ISBN matching regex replaced
  • PDFs now processed with new Calibre PDF engine to scan just first 10 and last 5 pages

See the attached text document for my test cases. Note that this release still makes no attempts to catch bad OCR scans (e.g. O instead of 0, I instead of 1 etc). It also will not match numbers split across multiple lines, or text underneath graphics. I have also not as yet optimised scanning non PDF formats.

It should however run significantly faster for PDFs and give you more matches than previously.
Attached Files
File Type: txt TestISBN.txt (658 Bytes, 128 views)
kiwidude is offline   Reply With Quote
Old 04-03-2011, 06:38 PM   #50
Loeffel
Connoisseur
Loeffel began at the beginning.
 
Loeffel's Avatar
 
Posts: 58
Karma: 10
Join Date: Mar 2011
Device: Kindle 3 3G
I will have a look for other ISBN types different from this I've posted and those in the textfile.

What I meant with the 1 is that if I scan only a few books for an ISBN then the number shows which book is scanned. If there is a a great number to scan he will show 1 until the scan is finished.
I can say exactly how many books just have only 1 format. Just 4
- 1 empty entry (placeholder for a book that will be published in may and which I've already bought)
- 3 dictionaries which I exclude from all conversion and such things as they are large and nothing to be found in them at all

[edit]All other books have a minimum of two formats (epub and mobi)[/edit]

But nevertheless this is a good plugin. I have some suggestions for it, but I need a little bit to write down, what I mean.

Last edited by Loeffel; 04-03-2011 at 06:41 PM. Reason: forgot something
Loeffel is offline   Reply With Quote
Old 04-04-2011, 12:46 AM   #51
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
The new pdf functionality seems to be broken, get this for any pdf:
Code:
calibre, version 0.7.53
Extract ISBN complete: Selected 1 books
Found 0 ISBN values
Updated 0 books

See details for more information

[book title here] - ERROR: (<type 'exceptions.TypeError'>, TypeError('function takes exactly 1 argument (3 given)',))
ldolse is offline   Reply With Quote
Old 04-04-2011, 04:08 AM   #52
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,224
Karma: 1334002
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
@Idolse - you will need to do a reinstall of the binaries for the 0.7.53 release (I believe you are running from source). Sorry for not mentioning that. Kovid recompiled the reflow C++ app.
kiwidude is offline   Reply With Quote
Old 04-04-2011, 05:27 AM   #53
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Ouch - the first pdf I tried caused a segfault (nothing to do with your plugin, more a poppler issue).

Fortunately the other pdfs I've tried have all been ok. Will open a bug on the problem pdf.
ldolse is offline   Reply With Quote
Old 04-04-2011, 05:32 AM   #54
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,224
Karma: 1334002
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
Glad you are up and (mostly) running now.

Indeed by taking this approach with this plugin and using the "still work in progress" new PDF engine there was always going to be a risk of hitting some issues. Other than hitting the odd pdf that caused an immense amount of debug messages I hadn't had any crashes like you got but then I haven't exactly thrashed it.

Still, I'm sure Kovid will be "delighted" to find and fix them
kiwidude is offline   Reply With Quote
Old 04-05-2011, 02:33 PM   #55
drMerry
Addict
drMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmos
 
drMerry's Avatar
 
Posts: 293
Karma: 21022
Join Date: Mar 2011
Location: NL
Device: Sony PRS-650
Great performance boost.
Real nice. Faster and more success.
Thanks.

But, 1 problem with the new pdf-parser. Calibre crashes (In my case it where all bigger pdf-files (10 - 170 MB).

Probleemhandtekening:
Gebeurtenisnaam van probleem: APPCRASH
Naam van de toepassing: calibre.exe
Versie van toepassing: 0.7.53.0
Tijdstempel van toepassing: 4d961400
Naam van foutmodule: pdfreflow.pyd
Versie van foutmodule: 0.0.0.0
Tijdstempel van foutmodule: 4d9613dc
Uitzonderingscode: c0000005
Uitzonderingsmarge: 00005e18
Versie van besturingssysteem: 6.1.7601.2.1.0.256.1
Landinstelling-id: 1043
Aanvullende informatie 1: 0a9e
Aanvullende informatie 2: 0a9e372d3b4ad19135b953a78882e789
Aanvullende informatie 3: 0a9e
Aanvullende informatie 4: 0a9e372d3b4ad19135b953a78882e789

One other question. Is it posible to set progress in an option-pane?
At this moment, Calibre is just wating for the plugin to stop. I can not (really) use calibre during scans.
When it is in an option pane, I could hit cancel (or run at background to get the curent implementation if this function is usefull for others).

But, thanks again for your work.
drMerry is offline   Reply With Quote
Old 04-05-2011, 03:18 PM   #56
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,224
Karma: 1334002
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
@drMerry

I would suggest (unless Kovid says otherwise) that any issues with the new PDF engine you put on the bug tracker, attaching the pdf for Kovid to take a look at. It sounds like the new PDF engine isn't being actively developed right now, but at least Kovid would be able to replicate the issue whenever he does next work on it.

As for running the scan in the background, that isn't going to happen anytime soon I'm afraid. The only background processing mechansim I have seen in Calibre is the "Jobs" stuff that gets used for when you convert books. However the risk with it (and possibly the same reason why stuff like download metadata doesn't run in the background) is that you have all the concurrency issues of two different things updating the same book record at the same time. I don't know if you have ever noticed but it is possible to lose your newly converted book from a job if you happen to be editing metadata for the same book at the same time the job completes.

Now maybe this is something Kovid plans to address in future, such as with some sort of optimistic or pessimistic locking mechanism which would prevent you editing the same book a job was running for. If he does, then I am sure I could look into revisiting it. Right now, I don't want to run the risk of any database corruption by a user being allowed to edit a book manually while the ISBN is being updated in the background.
kiwidude is offline   Reply With Quote
Old 04-05-2011, 07:08 PM   #57
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
That crash might already be fixed, I'd suggest waiting til the next release and checking again - Kovid already fixed the crash I mentioned a few days ago, it may be the same crash. I can't tell by your error log for sure if it's the same, as it's a different language/OS, but there is a good chance it's the same.
ldolse is offline   Reply With Quote
Old 04-08-2011, 04:19 AM   #58
drMerry
Addict
drMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmos
 
drMerry's Avatar
 
Posts: 293
Karma: 21022
Join Date: Mar 2011
Location: NL
Device: Sony PRS-650
Quote:
Originally Posted by kiwidude View Post
@drMerry
As for running the scan in the background, that isn't going to happen anytime soon I'm afraid. The only background processing mechansim I have seen in Calibre is the "Jobs" stuff that gets used for when you convert books. However the risk with it (and possibly the same reason why stuff like download metadata doesn't run in the background) is that you have all the concurrency issues of two different things updating the same book record at the same time. I don't know if you have ever noticed but it is possible to lose your newly converted book from a job if you happen to be editing metadata for the same book at the same time the job completes.
Is it possible to use a dialog-box to the process so you could tell callibre to stop the process (directly or after the check of the current pdf is completed)?

I had once selected all my books and by mistake started the plugin.
I can tell you. It takes a long time to check all 3250+ books (stored on network-drive in use with other processes and with old pdf-engine causing all my pdf-files complete parsed (books of 1400+ pages is no rarity in my lib)

So a abort option would be welcome.
drMerry is offline   Reply With Quote
Old 04-08-2011, 04:31 AM   #59
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,224
Karma: 1334002
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
It already has an abort option for interactive usage (i.e. if you are being asked which format to scan when there are multiple).

When it is operating non-interactively there currently is no way to stop it other than killing Calibre. As this is not the sort of thing you would be running repeatedly on your whole book collection I figured I could get away with it for a while.

Putting a dialog up and running the scan in the background is obviously possible, it just involves a lot more development. And some threading, something which is fraught with potential to go horribly wrong in Python/Qt if done badly.

It's on the future wishlist to take a look at.
kiwidude is offline   Reply With Quote
Old 04-09-2011, 08:50 PM   #60
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,224
Karma: 1334002
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
v1.2.1 Released

Changes in this release:
  • Support skinning of icons by putting them in a plugin name subfolder of local resources/images
kiwidude is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
[Old Thread] Extract ISBN from file name ChristianQ Calibre 56 05-20-2012 09:59 AM
[GUI Plugin] Plugin Updater **Deprecated** kiwidude Plugins 159 06-19-2011 12:27 PM
[Old Thread] Auto Extract ISBN-Feature request UnraisedArc Calibre 60 03-23-2011 09:31 AM
Displaying ISBN column in the main GUI tilleydog Library Management 26 02-25-2011 04:08 AM
Extract ISBN from PDF? mdroberts Calibre 10 12-15-2009 01:35 AM


All times are GMT -4. The time now is 01:34 PM.


MobileRead.com is a privately owned, operated and funded community.