Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 04-14-2010, 11:25 PM   #1
pilx
Junior Member
pilx began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Apr 2010
Device: android
ISBN scrapping out of pdf

Hi, all.

I'm wondering if there is some work done to scrap isbn out of pdf content.

I'm trying to get a big ebook collection in calibre but setting isbn by hand, one book at a time, would take more than a lifetime... so I thought of scrapping isbn from pdf content with some regex... I started to write a plugin for calibre but then thought of asking here if something like this wasn't already done or talked about before.

Anyone?
pilx is offline   Reply With Quote
Old 04-14-2010, 11:32 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,851
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
There isn't any existing work, but basically all you need to do is the following:

modify the pdf metadata reading code (in calibre.ebooks.metadata.pdf)

calibre contains a nice library for pdf reflow that converts pdf to xml use that and then search for the ISBN in the XML

Basically:

Code:
with CurrentDir(temp_dir):
     pdfreflow.reflow(stream.read())
will create index.xml in the current directory
kovidgoyal is offline   Reply With Quote
Advert
Old 04-14-2010, 11:47 PM   #3
pilx
Junior Member
pilx began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Apr 2010
Device: android
Ok... I've just found tickets 3013 and 4113 in calibre's trac.

Too bad I haven't found a PDF on which the feature actually worked out.

I'm willing to help improve that code. If can find it first!
pilx is offline   Reply With Quote
Old 04-14-2010, 11:51 PM   #4
pilx
Junior Member
pilx began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Apr 2010
Device: android
So the code mentioned in those tickets isn't there anymore?

Anyway, Kovid, thanks for the pointer, and for calibre by the way!

I'll try to go that way... and report back on my findings.
pilx is offline   Reply With Quote
Old 04-15-2010, 12:58 AM   #5
pilx
Junior Member
pilx began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Apr 2010
Device: android
Where can I find the source or API documentation of pdfreflow?
pilx is offline   Reply With Quote
Advert
Old 04-15-2010, 01:01 AM   #6
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,851
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
see ebooks/pdf/main.cpp
kovidgoyal is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Extract ISBN from PDF? mdroberts Calibre 14 12-16-2016 07:32 AM
Tool: ISBN to Name MarkDXG Kindle Developer's Corner 2 10-04-2010 09:15 AM
Updating Metadata without and ISBN herbycanopy Calibre 7 05-22-2010 01:16 AM
Question about ISBN numbers ficbot Calibre 2 12-04-2009 11:02 PM
isbn (esbn) HenryP Writers' Corner 4 02-22-2009 08:49 AM


All times are GMT -4. The time now is 11:52 PM.


MobileRead.com is a privately owned, operated and funded community.