Hi!
[ skip this if you are in a hurry ....
I am using calibre for months (without any plans to dig into its code) and recently got the idea of an application helping to learn vocabulary, using ebooks as a data base of "in context" translations.
Unfortunately, my development skills are a bit rust and it is taking me longer than I though to develop this django application. Also my few tentative of developping myself a .fb2 paragraph and section extractor demonstrate, that I would better re-use what was already done.
Anyway... enough context, let's get into the request itself: calibre is not only a great library/ebook converter/... , it also seems to be the python reference for ebook content extraction. Unfortunately, it is not published as a standalone module, and its code is just huge!
My understanding is that everybook will be mapped to ebooks.oeb.base at some point in the conversion chain. So according to you, shall I try to instanciate ebooks.oeb.base and use it extract ebook information? If so, I would appreciate if you could redirect me to information that could help/similar code if you know some.
Alternatively, I tried to have a look at the Calibre viewer as it requires to access the ebook content (like my application): the calibre gui2 viewer main.py - load_ebook function seems a good example.
https://github.com/bibihoma/calibre/...viewer/main.py ( load_ebook function). This suggest that I should rathermore use calibre.ebooks.oeb.iterator.book to navigate within a book.
Any comment on what is the best approach?
In case someone reads this post until this point,]
the short question is: given an ebook path, how to load the ebook in a python structure and access its chapters and paragraphs in sequence?
Thanks, bibihoma