Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Development

Notices

Reply
 
Thread Tools Search this Thread
Old 10-24-2013, 10:48 AM   #1
bibihoma
Junior Member
bibihoma began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Oct 2013
Device: kindle
importing ebook and extracting content

Hi!
[ skip this if you are in a hurry ....

I am using calibre for months (without any plans to dig into its code) and recently got the idea of an application helping to learn vocabulary, using ebooks as a data base of "in context" translations.

Unfortunately, my development skills are a bit rust and it is taking me longer than I though to develop this django application. Also my few tentative of developping myself a .fb2 paragraph and section extractor demonstrate, that I would better re-use what was already done.

Anyway... enough context, let's get into the request itself: calibre is not only a great library/ebook converter/... , it also seems to be the python reference for ebook content extraction. Unfortunately, it is not published as a standalone module, and its code is just huge!

My understanding is that everybook will be mapped to ebooks.oeb.base at some point in the conversion chain. So according to you, shall I try to instanciate ebooks.oeb.base and use it extract ebook information? If so, I would appreciate if you could redirect me to information that could help/similar code if you know some.

Alternatively, I tried to have a look at the Calibre viewer as it requires to access the ebook content (like my application): the calibre gui2 viewer main.py - load_ebook function seems a good example.
https://github.com/bibihoma/calibre/...viewer/main.py ( load_ebook function). This suggest that I should rathermore use calibre.ebooks.oeb.iterator.book to navigate within a book.
Any comment on what is the best approach?

In case someone reads this post until this point,]

the short question is: given an ebook path, how to load the ebook in a python structure and access its chapters and paragraphs in sequence?

Thanks, bibihoma
bibihoma is offline   Reply With Quote
Old 10-24-2013, 11:58 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
What approach you use depends on what you are trying to do. If all you want to do is extract the html content from the ebook the either oeb.iterator or oeb.polish.container will work.
kovidgoyal is offline   Reply With Quote
Advert
Old 10-26-2013, 11:00 AM   #3
bibihoma
Junior Member
bibihoma began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Oct 2013
Device: kindle
Thanks for the quick answer.

Unfortunately, I am developping under windows. I tried running setup.py after downloading sources, but after few depencies installation (Qt, ....) I encountered a "Unable to find vcvarsall.bat" which ended my hopes to test your proposition.

Do you have any plan in the future to package calibre into various submodules that could be installed via simple pip install?

$ python setup.py
Traceback (most recent call last):
File "setup.py", line 13, in <module>
import setup.commands as commands
File "c:\Users\MCProject\calibre\setup\commands.py" , line 34, in <module>
from setup.extensions import Build
File "c:\Users\MCProject\calibre\setup\extensions.p y", line 16, in <module>
from setup.build_environment import (chmlib_inc_dirs,
File "c:\Users\MCProject\calibre\setup\build_environmen t.py", line 25, in <module>
msvc.initialize()
File "c:\Python27\lib\distutils\msvc9compiler.py", line 383, in initialize
vc_env = query_vcvarsall(VERSION, plat_spec)
File "c:\Python27\lib\distutils\msvc9compiler.py", line 271, in query_vcvarsall
raise DistutilsPlatformError("Unable to find vcvarsall.bat")
distutils.errors.DistutilsPlatformError: Unable to find vcvarsall.bat
bibihoma is offline   Reply With Quote
Old 10-26-2013, 11:19 AM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
http://manual.calibre-ebook.com/deve...-your-projects
kovidgoyal is offline   Reply With Quote
Old 10-27-2013, 09:03 AM   #5
bibihoma
Junior Member
bibihoma began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Oct 2013
Device: kindle
Thanks, indeed reading AND applying the documentation helps. Sorry for having wasted your time... I will delete my above post above as it has nothing to do with the topic (and it is a shame for me)

Still one more question: the documentation mentions that we can run script with this syntax:
$ calibre-debug manage.py -- --runserver

I am not familiar at all with what calibre-debug, but it does NOT seem to be using the python installed packages. I have django installed on my python 2.7. and the django launch command is working fine (manage.py runserver).

However enclosing the django launch code in calibre-debug fails when trying to import the django package.
"Python function terminated unexpectedly
No module named django.core.management (Error Code: 1)"

Is it because calibre-debug is looking for packages in the calibre src folder only?
If so, is your advise to try copying django package in calibre src folder? or is there any option that can be activated to allow usage of installed python packages?

Thanks
bibihoma is offline   Reply With Quote
Advert
Old 10-27-2013, 09:18 AM   #6
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Yes calibre-debug uses only the calibre src folder. You have many options to get around that. First run this

calibre-debug -c "import sys; print sys.path"

If you put your django folder in some folder listed there, it will be used.

Alternately, you can modify mamage.py at the top, to do this:

import sys
sys.path.append('/path/to/django')

before importing any django modules.
kovidgoyal is offline   Reply With Quote
Old 10-28-2013, 10:32 AM   #7
bibihoma
Junior Member
bibihoma began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Oct 2013
Device: kindle
Thanks Kovidgoyal!

I don't know how clean this solution is... but it works

In case someone has to use django and some more libs installed in python, here is a simple script that includes all the python site-packages:

libFoder = "c:/Python27/Lib/site-packages"
for root,dir, files in os.walk(libFoder):
sys.path.append(os.path.join(libFoder, root))
bibihoma is offline   Reply With Quote
Old 10-29-2013, 09:59 AM   #8
bibihoma
Junior Member
bibihoma began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Oct 2013
Device: kindle
Re Kovidgoyal,

Sorry, once again, I'll ask for your help to go beyond this first major achievement: extracting the title of the ebook ;-)
I am now trying to extract the content itself. Shall I use load_html?
If so, how can I instantiate a View to pass to this fuction?

Code:
 
from calibre.ebooks.oeb.iterator.book import EbookIterator
from calibre.ebooks.oeb.display.webview import load_html    
    iterator = EbookIterator("C:/Users/MCProject/Dropbox/Colin/DjangAptana/mysite/kdfr.fb2") 
    iterator.__enter__()
    logger.debug(iterator)     
    logger.debug(iterator.opf.title) 
    for doc in iterator.spine:
        print doc
        load_html(doc, view, codec=getattr(doc, 'encoding', 'utf-8'), mime_type=getattr(path,'mime_type', 'text/html'))
Thanks, Bibihoma
bibihoma is offline   Reply With Quote
Old 10-29-2013, 10:55 AM   #9
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
You dont load the html you simply open the file and read it in to get the html.

html = open(doc, 'rb').read()
kovidgoyal is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Best eBook Reader to sideload content? JaynarOJ Which one should I buy? 3 02-14-2012 01:08 PM
iPad Best ebook reader for personal content? kgian Apple Devices 20 11-07-2010 09:16 AM
Extracting markups (annotations and highlites) from your ebook! nrapallo Fictionwise eBookwise 20 05-11-2010 11:37 PM
eBook content in Canada jgsmith News 8 12-22-2009 12:18 AM
Best way to make an ebook from web content ? sebastienbillard Workshop 2 11-24-2009 11:13 AM


All times are GMT -4. The time now is 04:53 PM.


MobileRead.com is a privately owned, operated and funded community.