MobileRead Forums - View Single Post

bibihoma · 10-24-2013, 11:48 AM

Hi!
[ skip this if you are in a hurry ....

I am using calibre for months (without any plans to dig into its code) and recently got the idea of an application helping to learn vocabulary, using ebooks as a data base of "in context" translations.

Unfortunately, my development skills are a bit rust and it is taking me longer than I though to develop this django application. Also my few tentative of developping myself a .fb2 paragraph and section extractor demonstrate, that I would better re-use what was already done.

Anyway... enough context, let's get into the request itself: calibre is not only a great library/ebook converter/... , it also seems to be the python reference for ebook content extraction. Unfortunately, it is not published as a standalone module, and its code is just huge!

My understanding is that everybook will be mapped to ebooks.oeb.base at some point in the conversion chain. So according to you, shall I try to instanciate ebooks.oeb.base and use it extract ebook information? If so, I would appreciate if you could redirect me to information that could help/similar code if you know some.

Alternatively, I tried to have a look at the Calibre viewer as it requires to access the ebook content (like my application): the calibre gui2 viewer main.py - load_ebook function seems a good example.
https://github.com/bibihoma/calibre/...viewer/main.py ( load_ebook function). This suggest that I should rathermore use calibre.ebooks.oeb.iterator.book to navigate within a book.
Any comment on what is the best approach?

In case someone reads this post until this point,]

the short question is: given an ebook path, how to load the ebook in a python structure and access its chapters and paragraphs in sequence?

Thanks, bibihoma

10-24-2013, 11:48 AM	#1
bibihoma Junior Member Posts: 5 Karma: 10 Join Date: Oct 2013 Device: kindle	importing ebook and extracting content Hi! [ skip this if you are in a hurry .... I am using calibre for months (without any plans to dig into its code) and recently got the idea of an application helping to learn vocabulary, using ebooks as a data base of "in context" translations. Unfortunately, my development skills are a bit rust and it is taking me longer than I though to develop this django application. Also my few tentative of developping myself a .fb2 paragraph and section extractor demonstrate, that I would better re-use what was already done. Anyway... enough context, let's get into the request itself: calibre is not only a great library/ebook converter/... , it also seems to be the python reference for ebook content extraction. Unfortunately, it is not published as a standalone module, and its code is just huge! My understanding is that everybook will be mapped to ebooks.oeb.base at some point in the conversion chain. So according to you, shall I try to instanciate ebooks.oeb.base and use it extract ebook information? If so, I would appreciate if you could redirect me to information that could help/similar code if you know some. Alternatively, I tried to have a look at the Calibre viewer as it requires to access the ebook content (like my application): the calibre gui2 viewer main.py - load_ebook function seems a good example. https://github.com/bibihoma/calibre/...viewer/main.py ( load_ebook function). This suggest that I should rathermore use calibre.ebooks.oeb.iterator.book to navigate within a book. Any comment on what is the best approach? In case someone reads this post until this point,] the short question is: given an ebook path, how to load the ebook in a python structure and access its chapters and paragraphs in sequence? Thanks, bibihoma