MobileRead Forums - View Single Post - Feature Proposal: Gemini/LLM API Integration in E-book Viewer

amirthfultehrani · 08-06-2025, 01:42 AM

Quote:

Originally Posted by kovidgoyal

Looks fine to me, am sure some people will find it useful, like the Lookup panel already present in the viewer. I assume this requires the user to configure which LLM they want to interact with? How are the api keys/passwords whatever stored? Does sit support querying a local LLM?

One concern is that it should also be implemented in the content server viewer, though that can be done in a later iteration.

David, thank you for your direct engagement and specific questions! Practicality, security, and future compatibility - what excellent things I hope to work toward, especially from an excellent lead developer like you! Thank you for the great work. It is inspiring.

It seems my poll (as of time of viewing) speaks such that, at the very least, our Calibre development community may not gain much from this feature. Maybe this was an issue in the wording of the answers (maybe I should have eliminated "significantly" and just left it at "enhanced"). In any case, to answer your questions, David:

Yes, the user configures the LLM. Within the Gemini panel, there's a settings icon, that, when clicked, opens a dedicate dialog where the user can select their preferred model from a predefined list (currently Gemini 1.5 Pro and Gemini 1.5 Flash, as these are the "stable," general-purpose models available via Google's Generative Language API). The dropdown displays human-readable names, while the backend uses the specific model IDs (attachment shown; also showed attachment of the "quick actions" editing).
The API key is stored using Calibre's existing 'vprefs' system (calibre.gui2.viewer.config.vprefs). If I understand Calibre's architecture correctly, that means the key is saved within Calibre's configuration files (relevant screenshot attached; prefs.json/viewer.json <- something like this I imagine is the innate origin). That means leveraging Calibre's established preference management, and since I worked with the source build, that would mean (hopefully) the sensitive data would be safe (b/c it'd be specific to the user's Calibre installation).
Currently, my implementation supports solely cloud-based LLMs accessible via HTTP API calls (right now, only Google's Generative Language API endpoint). It does not support querying a local LLM directly. Integrating with local LLMs (e.g., Ollama, local web servers for LLama.cpp, etc.,) would require a different set of API clients/local process communication, which I could certainly do! My only concern would be if it's a valuable future enhancement.
David, I apologize, but what do you mean when you refer to "content server viewer"? I just looked it up on the Calibre site manual, and this sounds like a very cool feature of Calibre that I did not know about (viewing books in a browser??)! I would be happy to explore this if it is desired that this core feature be implemented (or anything adjacent) and if there's interest in proceeding.

Lastly, the entire feature, including the 'GeminiPanel' and 'GeminiSettingsDialog' classes, are contained within 'ui.py' and its next door co-located modules ('gemini_panel.py,' 'gemini_settings.py'). Thankfully, in my testing, it has proven highly stable and avoids prior issues I faced with top-level imports that caused crashes.

In ending, David, I am unsure given our friends comments and poll results if this would be something worth moving forward. However, I am very keen on contributing to Calibre, and if it is okay, I ask you to please let me know what would be something "similar" (architecturally, conceptually, any "...ally") that I could help with.

Thank you again, David, for the excellent work & thoughts!