Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Plugins

Notices

Reply
 
Thread Tools Search this Thread
Old 11-11-2011, 08:52 AM   #211
jlutes
Connoisseur
jlutes began at the beginning.
 
Posts: 52
Karma: 12
Join Date: Jul 2011
Device: none
I was afraid that was the case with the filename. As for priority scanning the metadata tags, my first thought would be to make it user-controllable via an option. I would say set the default action as "look in metedata if there is no match elsewhere" but I could see where someone might want to reverse that under certain circumstances.
jlutes is offline   Reply With Quote
Old 11-12-2011, 10:08 AM   #212
capnm
Groupie
capnm knows the difference between 'who' and 'whom'capnm knows the difference between 'who' and 'whom'capnm knows the difference between 'who' and 'whom'capnm knows the difference between 'who' and 'whom'capnm knows the difference between 'who' and 'whom'capnm knows the difference between 'who' and 'whom'capnm knows the difference between 'who' and 'whom'capnm knows the difference between 'who' and 'whom'capnm knows the difference between 'who' and 'whom'capnm knows the difference between 'who' and 'whom'capnm knows the difference between 'who' and 'whom'
 
Posts: 156
Karma: 10001
Join Date: Feb 2011
Device: sony
Quote:
Originally Posted by kiwidude View Post
I can see a case for wanting the Extract ISBN plugin to attempt to read it from metadata.
When this plugin was in its infancy, I really wanted to be able to pull the ISBN from the in-book metadata either as a fallback, or to compare, and was frustrated that there was no easy way to, in an already added book, get info from the book's internal metadata into the calibre database.

Then I decided that the garbage level, even in commercial ebooks, was just too high, and maybe ignoring the in-book metadata wasn't so bad after all (and I'm a data miser -- I hate ignoring/discarding potentially useful data).


And while this case:
no ISBN can be found in the book content, but one is present in the metadata.
is rare, this case:
no ISBN can be found in the book content, but an accurate one is present in the metadata.
is really rare.

Unfortunately, this case:
an accurate ISBN can be found in the book content, but a different one is present in the metadata.
is really common, making this:
the ISBN extracted from the book is incorrect, but there is a correct one set in the metadata.
pretty impossible to reliably detect.


I don't mind so much when no ISBN can be found in the content, but these:
the ISBN extracted from the book is for a different book (such as an advertisement for a related book for the publisher).
really nag me, because they're stealthy errors.

Maybe the next step is a Verify ISBN plugin that would check the author/title/ISBN against one of the ISBN pools and flag mismatches and not-founds ....

Last edited by capnm; 11-12-2011 at 10:10 AM.
capnm is offline   Reply With Quote
Advert
Old 11-12-2011, 10:44 AM   #213
capnm
Groupie
capnm knows the difference between 'who' and 'whom'capnm knows the difference between 'who' and 'whom'capnm knows the difference between 'who' and 'whom'capnm knows the difference between 'who' and 'whom'capnm knows the difference between 'who' and 'whom'capnm knows the difference between 'who' and 'whom'capnm knows the difference between 'who' and 'whom'capnm knows the difference between 'who' and 'whom'capnm knows the difference between 'who' and 'whom'capnm knows the difference between 'who' and 'whom'capnm knows the difference between 'who' and 'whom'
 
Posts: 156
Karma: 10001
Join Date: Feb 2011
Device: sony
@kiwidude:
I still get the occasional epub where Extract ISBN misses, bafflingly, but they're rare enough I just shrug, manually find the ISBN in the text, copy, paste, and move on.

I'll PM you a sample, to look at if you're curious, to ignore if you're busy

Thanks.
capnm is offline   Reply With Quote
Old 11-12-2011, 01:48 PM   #214
kiwidude
Calibre Plugins Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,636
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
v1.4.1 Released

Changes in this release:
  • Exclude leading spaces before the ISBN number which prevented some valid ISBNs from being detected.

@capnm - this fixed the issue with the epub you sent me, thx.
kiwidude is offline   Reply With Quote
Old 12-30-2011, 01:33 PM   #215
Nyssa
Series Addict
Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.
 
Nyssa's Avatar
 
Posts: 6,180
Karma: 167189477
Join Date: Dec 2010
Location: Florida, USA
Device: Kindle Paperwhite (2nd Gen)
Question Question:

Will this overwrite an isbn that is already there (say from downloading metadata) or does it just add the extracted one to the others?

Last edited by Nyssa; 12-30-2011 at 02:25 PM. Reason: typo
Nyssa is offline   Reply With Quote
Advert
Old 12-30-2011, 01:50 PM   #216
kiwidude
Calibre Plugins Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,636
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
@Nyssa - it will always overwrite any existing ISBN if extract ISBN finds a valid one.
kiwidude is offline   Reply With Quote
Old 12-30-2011, 02:25 PM   #217
Nyssa
Series Addict
Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.
 
Nyssa's Avatar
 
Posts: 6,180
Karma: 167189477
Join Date: Dec 2010
Location: Florida, USA
Device: Kindle Paperwhite (2nd Gen)
Okay. Thank you.
Nyssa is offline   Reply With Quote
Old 03-13-2012, 01:28 AM   #218
greatdragon
Junior Member
greatdragon began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Mar 2012
Device: Sony PRS-T1
Regex Tweek

Quote:
Originally Posted by sdspieg View Post
Super plugin! Thanks much... Calibre still keeps getting better and better...

Any idea why it does not work on all files though? I have some books in my collection for which I CAN find the isbn number when I open the pdf file and look for it myself, but that the plugin didn't get right... Would you be interested in some books for which it doesn't work?

Cheers,

-Stephan
there is a slight mod to the regex you are using and your work load the "\s*" at the start of your regex is unneeded it and in some of my formatted PDF caused an issue as there was no valid spaces so the regex came up false. also removing offers a small performance boost not huge but with regexes any thing you can trim out saves cpu


but great plugin keep up the good work and if I notice any other improvement I will let you know
greatdragon is offline   Reply With Quote
Old 03-13-2012, 05:20 AM   #219
kiwidude
Calibre Plugins Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,636
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
@greatdragon - first the disclaimer - I am not a regex guru, I know the basics to get by. However note that it is a \s* not \s+, so it should make no difference to your pdf, as it is 0 or more matches?

I can't remember all the reasons why it is there, there have been many iterations of this plugin over its lifetime to get to where it is today. It may have been to catch some case I can't recall. Or it may have been to "soak up" leading spaces to prevent a document with loads of consecutive spaces reporting as matches (since space is a valid character in the next part of the expression).

Now if others who know far more about me than regex agree with your finding then I can look to change it, but I am firmly in the "if it ain't broke don't fix it" camp. Performance isn't a reason if the change were to reduce its effectiveness for some reason, particularly since it runs as a background job.
kiwidude is offline   Reply With Quote
Old 05-27-2012, 07:09 PM   #220
Joanna
Groupie
Joanna understands the mechanisms of the catecholamine pathways.Joanna understands the mechanisms of the catecholamine pathways.Joanna understands the mechanisms of the catecholamine pathways.Joanna understands the mechanisms of the catecholamine pathways.Joanna understands the mechanisms of the catecholamine pathways.Joanna understands the mechanisms of the catecholamine pathways.Joanna understands the mechanisms of the catecholamine pathways.Joanna understands the mechanisms of the catecholamine pathways.Joanna understands the mechanisms of the catecholamine pathways.Joanna understands the mechanisms of the catecholamine pathways.Joanna understands the mechanisms of the catecholamine pathways.
 
Posts: 199
Karma: 76476
Join Date: Feb 2012
Location: Poland
Device: none
I have just switched to a new installation of Calibre Portable and now, for some reason, I get an error everytime I launch Extract ISBN on a .pdf file ("access violation"). The plugin works impeccably with epub files, no other errors occured in Calibre. Any ideas? All help appreciated .
Joanna is offline   Reply With Quote
Old 05-28-2012, 02:25 AM   #221
Dinesh.kaundal
Junior Member
Dinesh.kaundal began at the beginning.
 
Posts: 1
Karma: 10
Join Date: May 2012
Location: Solan, Himachal Pradesh, Bharat Varsh ( India )
Device: none
ISBN Extract Plugin (Version 1.4.1) as when executed it crashes calibre

When I upgraded Calibre from 0.8.52 to 0.8.53
ISBN Extract Plugin as when executed it crashes calibre
my system details are as
OS Windows 7 x64 SP1


Again Rolled Back to calibre-0.8.52 it working fine

Regards

Dinesh

Last edited by Dinesh.kaundal; 05-28-2012 at 02:36 AM. Reason: Update
Dinesh.kaundal is offline   Reply With Quote
Old 05-28-2012, 03:07 AM   #222
kiwidude
Calibre Plugins Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,636
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
Normally I run from the latest source code and the last binary install I had done was 0.8.52 (everything working fine). I just installed the binaries for 0.8.53, then I also find that calibre crashes with 0.8.53 (but only when scanning PDF files.)

Which implies that perhaps Kovid "broke something" in the PDF code (which being C++ is the most likely thing to cause such a crash).

@Kovid - here is what my code does where I believe it is crashing:
Spoiler:
Code:
    def _read_pdf_txt(self, book_path, start_page, end_page):
        from calibre.constants import plugins
        pdfreflow, pdfreflow_err = plugins['pdfreflow']
        with open(book_path, 'rb') as stream:
            tdir = PersistentTemporaryDirectory('_isbn')
            with CurrentDir(tdir):
                pages = pdfreflow.reflow(stream.read(), start_page, end_page)
                with open('index.xml', 'rb') as f:
                    xml = f.read()
                    #open('E:\\%d.xml'%start_page,'wb').write(xml)
        root = etree.fromstring(clean_ascii_chars(xml))
        txt = etree.tostring(root, method='text', encoding=unicode)
        return (pages, txt)
kiwidude is offline   Reply With Quote
Old 05-28-2012, 07:54 AM   #223
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,842
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
@kiwidude: 0.8.53 updated to poppler 0.20 which is probably why its crashing. I've committed some code to enable the xml output from pdftohtml use that instead, it will prevent this kind of crash in the future.

pdftohtml(..., as_xml=True)
kovidgoyal is offline   Reply With Quote
Old 05-28-2012, 08:20 AM   #224
kiwidude
Calibre Plugins Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,636
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
@Kovid - thx for that, though if I understand you correctly you are saying to use calibre's existing PDF engine via pdftohtml rather than the poppler stuff via pdfreflow, right?

As IIRC pdftohtml is what this plugin originally used, but we found it to be very, very slow (particularly on graphical pdfs). Whereas using pdfreflow allowed the plugin to scan subsets of only the front few and last few pages.

No chance of the pdfreflow stuff getting fixed?
kiwidude is offline   Reply With Quote
Old 05-28-2012, 09:07 AM   #225
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,842
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
It's not a priority for me, the parts of the poppler api that pdfreflow uses are not stable, they change with pretty much every poppler 0.x release, which makes maintaining them a pain. I am switching the new pdf engine to use pdftohtml -xml which produces the same kind of output as pdfreflow, the upside being that I no longer have to maintain pdfreflow's C++ code. The downside, from your perspective, is that pdftohtml does not support specifying a pdf page range for conversion. You have four choices:

1) Maintain pdfreflow yourself, i'm happy to accept patches.

2) Ask the poppler people to implement page ranges for pdftohtml

3) Use another pdf library (calibre has both podofo and pypdf) to first extract the relevant pages and then run pdftohtml on them.

4) Live with the reduced performance
kovidgoyal is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Extract ISBN from PDF? mdroberts Calibre 14 12-16-2016 07:32 AM
[Old Thread] Extract ISBN from file name ChristianQ Calibre 59 12-09-2015 05:08 AM
[GUI Plugin] Plugin Updater **Deprecated** kiwidude Plugins 159 06-19-2011 12:27 PM
[Old Thread] Auto Extract ISBN-Feature request UnraisedArc Calibre 60 03-23-2011 09:31 AM
Displaying ISBN column in the main GUI tilleydog Library Management 26 02-25-2011 04:08 AM


All times are GMT -4. The time now is 06:40 AM.


MobileRead.com is a privately owned, operated and funded community.