11-11-2011, 08:52 AM | #211 |
Connoisseur
Posts: 52
Karma: 12
Join Date: Jul 2011
Device: none
|
I was afraid that was the case with the filename. As for priority scanning the metadata tags, my first thought would be to make it user-controllable via an option. I would say set the default action as "look in metedata if there is no match elsewhere" but I could see where someone might want to reverse that under certain circumstances.
|
11-12-2011, 10:08 AM | #212 | |
Groupie
Posts: 156
Karma: 10001
Join Date: Feb 2011
Device: sony
|
Quote:
Then I decided that the garbage level, even in commercial ebooks, was just too high, and maybe ignoring the in-book metadata wasn't so bad after all (and I'm a data miser -- I hate ignoring/discarding potentially useful data). And while this case: no ISBN can be found in the book content, but one is present in the metadata. is rare, this case: no ISBN can be found in the book content, but an accurate one is present in the metadata. is really rare. Unfortunately, this case: an accurate ISBN can be found in the book content, but a different one is present in the metadata. is really common, making this: the ISBN extracted from the book is incorrect, but there is a correct one set in the metadata. pretty impossible to reliably detect. I don't mind so much when no ISBN can be found in the content, but these: the ISBN extracted from the book is for a different book (such as an advertisement for a related book for the publisher). really nag me, because they're stealthy errors. Maybe the next step is a Verify ISBN plugin that would check the author/title/ISBN against one of the ISBN pools and flag mismatches and not-founds .... Last edited by capnm; 11-12-2011 at 10:10 AM. |
|
Advert | |
|
11-12-2011, 10:44 AM | #213 |
Groupie
Posts: 156
Karma: 10001
Join Date: Feb 2011
Device: sony
|
@kiwidude:
I still get the occasional epub where Extract ISBN misses, bafflingly, but they're rare enough I just shrug, manually find the ISBN in the text, copy, paste, and move on. I'll PM you a sample, to look at if you're curious, to ignore if you're busy Thanks. |
11-12-2011, 01:48 PM | #214 |
Calibre Plugins Developer
Posts: 4,636
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
v1.4.1 Released
Changes in this release:
@capnm - this fixed the issue with the epub you sent me, thx. |
12-30-2011, 01:33 PM | #215 |
Series Addict
Posts: 6,180
Karma: 167189477
Join Date: Dec 2010
Location: Florida, USA
Device: Kindle Paperwhite (2nd Gen)
|
Question:
Will this overwrite an isbn that is already there (say from downloading metadata) or does it just add the extracted one to the others?
Last edited by Nyssa; 12-30-2011 at 02:25 PM. Reason: typo |
Advert | |
|
12-30-2011, 01:50 PM | #216 |
Calibre Plugins Developer
Posts: 4,636
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
@Nyssa - it will always overwrite any existing ISBN if extract ISBN finds a valid one.
|
12-30-2011, 02:25 PM | #217 |
Series Addict
Posts: 6,180
Karma: 167189477
Join Date: Dec 2010
Location: Florida, USA
Device: Kindle Paperwhite (2nd Gen)
|
Okay. Thank you.
|
03-13-2012, 01:28 AM | #218 | |
Junior Member
Posts: 2
Karma: 10
Join Date: Mar 2012
Device: Sony PRS-T1
|
Regex Tweek
Quote:
but great plugin keep up the good work and if I notice any other improvement I will let you know |
|
03-13-2012, 05:20 AM | #219 |
Calibre Plugins Developer
Posts: 4,636
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
@greatdragon - first the disclaimer - I am not a regex guru, I know the basics to get by. However note that it is a \s* not \s+, so it should make no difference to your pdf, as it is 0 or more matches?
I can't remember all the reasons why it is there, there have been many iterations of this plugin over its lifetime to get to where it is today. It may have been to catch some case I can't recall. Or it may have been to "soak up" leading spaces to prevent a document with loads of consecutive spaces reporting as matches (since space is a valid character in the next part of the expression). Now if others who know far more about me than regex agree with your finding then I can look to change it, but I am firmly in the "if it ain't broke don't fix it" camp. Performance isn't a reason if the change were to reduce its effectiveness for some reason, particularly since it runs as a background job. |
05-27-2012, 07:09 PM | #220 |
Groupie
Posts: 199
Karma: 76476
Join Date: Feb 2012
Location: Poland
Device: none
|
I have just switched to a new installation of Calibre Portable and now, for some reason, I get an error everytime I launch Extract ISBN on a .pdf file ("access violation"). The plugin works impeccably with epub files, no other errors occured in Calibre. Any ideas? All help appreciated .
|
05-28-2012, 02:25 AM | #221 |
Junior Member
Posts: 1
Karma: 10
Join Date: May 2012
Location: Solan, Himachal Pradesh, Bharat Varsh ( India )
Device: none
|
ISBN Extract Plugin (Version 1.4.1) as when executed it crashes calibre
When I upgraded Calibre from 0.8.52 to 0.8.53
ISBN Extract Plugin as when executed it crashes calibre my system details are as OS Windows 7 x64 SP1 Again Rolled Back to calibre-0.8.52 it working fine Regards Dinesh Last edited by Dinesh.kaundal; 05-28-2012 at 02:36 AM. Reason: Update |
05-28-2012, 03:07 AM | #222 |
Calibre Plugins Developer
Posts: 4,636
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
Normally I run from the latest source code and the last binary install I had done was 0.8.52 (everything working fine). I just installed the binaries for 0.8.53, then I also find that calibre crashes with 0.8.53 (but only when scanning PDF files.)
Which implies that perhaps Kovid "broke something" in the PDF code (which being C++ is the most likely thing to cause such a crash). @Kovid - here is what my code does where I believe it is crashing: Spoiler:
|
05-28-2012, 07:54 AM | #223 |
creator of calibre
Posts: 43,842
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
@kiwidude: 0.8.53 updated to poppler 0.20 which is probably why its crashing. I've committed some code to enable the xml output from pdftohtml use that instead, it will prevent this kind of crash in the future.
pdftohtml(..., as_xml=True) |
05-28-2012, 08:20 AM | #224 |
Calibre Plugins Developer
Posts: 4,636
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
@Kovid - thx for that, though if I understand you correctly you are saying to use calibre's existing PDF engine via pdftohtml rather than the poppler stuff via pdfreflow, right?
As IIRC pdftohtml is what this plugin originally used, but we found it to be very, very slow (particularly on graphical pdfs). Whereas using pdfreflow allowed the plugin to scan subsets of only the front few and last few pages. No chance of the pdfreflow stuff getting fixed? |
05-28-2012, 09:07 AM | #225 |
creator of calibre
Posts: 43,842
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
It's not a priority for me, the parts of the poppler api that pdfreflow uses are not stable, they change with pretty much every poppler 0.x release, which makes maintaining them a pain. I am switching the new pdf engine to use pdftohtml -xml which produces the same kind of output as pdfreflow, the upside being that I no longer have to maintain pdfreflow's C++ code. The downside, from your perspective, is that pdftohtml does not support specifying a pdf page range for conversion. You have four choices:
1) Maintain pdfreflow yourself, i'm happy to accept patches. 2) Ask the poppler people to implement page ranges for pdftohtml 3) Use another pdf library (calibre has both podofo and pypdf) to first extract the relevant pages and then run pdftohtml on them. 4) Live with the reduced performance |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Extract ISBN from PDF? | mdroberts | Calibre | 14 | 12-16-2016 07:32 AM |
[Old Thread] Extract ISBN from file name | ChristianQ | Calibre | 59 | 12-09-2015 05:08 AM |
[GUI Plugin] Plugin Updater **Deprecated** | kiwidude | Plugins | 159 | 06-19-2011 12:27 PM |
[Old Thread] Auto Extract ISBN-Feature request | UnraisedArc | Calibre | 60 | 03-23-2011 09:31 AM |
Displaying ISBN column in the main GUI | tilleydog | Library Management | 26 | 02-25-2011 04:08 AM |