View Single Post
Old 10-24-2013, 10:48 AM   #1
bibihoma
Junior Member
bibihoma began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Oct 2013
Device: kindle
importing ebook and extracting content

Hi!
[ skip this if you are in a hurry ....

I am using calibre for months (without any plans to dig into its code) and recently got the idea of an application helping to learn vocabulary, using ebooks as a data base of "in context" translations.

Unfortunately, my development skills are a bit rust and it is taking me longer than I though to develop this django application. Also my few tentative of developping myself a .fb2 paragraph and section extractor demonstrate, that I would better re-use what was already done.

Anyway... enough context, let's get into the request itself: calibre is not only a great library/ebook converter/... , it also seems to be the python reference for ebook content extraction. Unfortunately, it is not published as a standalone module, and its code is just huge!

My understanding is that everybook will be mapped to ebooks.oeb.base at some point in the conversion chain. So according to you, shall I try to instanciate ebooks.oeb.base and use it extract ebook information? If so, I would appreciate if you could redirect me to information that could help/similar code if you know some.

Alternatively, I tried to have a look at the Calibre viewer as it requires to access the ebook content (like my application): the calibre gui2 viewer main.py - load_ebook function seems a good example.
https://github.com/bibihoma/calibre/...viewer/main.py ( load_ebook function). This suggest that I should rathermore use calibre.ebooks.oeb.iterator.book to navigate within a book.
Any comment on what is the best approach?

In case someone reads this post until this point,]

the short question is: given an ebook path, how to load the ebook in a python structure and access its chapters and paragraphs in sequence?

Thanks, bibihoma
bibihoma is offline   Reply With Quote