MobileRead Forums - View Single Post - Feature Proposal: Gemini/LLM API Integration in E-book Viewer

amirthfultehrani · 08-04-2025, 04:53 AM

Hello Calibre developers and community, may you all be well!

As a long-time user, I admire and appreciate the continuous effort that has made Calibre the indispensable application it is today. My sincere gratitude to all involved.

My gratitude aside, now a reflection on the reading experience (that many may be able to attest to).

Through countless hours of reading, I've often found myself pausing, grappling with questions a page (or pages) alone cannot answer. Such unanswered questions frequently necessitates reaching for external tools - be it search engines or, increasingly, Large Language Models (LLMs) like ChatGPT - to bring the reader (me) to deeper comprehension.

Think extracting the logical structure from an argument in Spinoza's Ethics or identifying experimental findings from a highly technical paper in Science magazine -- sure, one can meticulously re-read, cross-reference, and manually outline these works, to eventually get to the "answers," but an LLM offers that which traditional tools cannot provide: an immediate, context-aware synthesis, one that drastically reduces the cognitive overhead and time spent in the pursuit of genuine understanding; after all, effort saved is focus gained.

To expand briefly on the point: unlike static resources that provide isolated facts or definitions (e.g., search engines or dictionaries), LLMs can take a paragraph or passage as complex as Kant's Critique of Pure Reason or as dense as a molecular biology research paper from Nature and instead synthesize its core ideas, explain complex relationships, or rephrase it into simpler term -- and all while retaining important context.

It has led me to a realization that I have finally decided to act upon: while having a separate LLM window (ChatGPT, Claude, Grok, Gemini, etc.,) works, imagine the metamorphic potential if this capability were integrated directly into the Calibre application itself.

With that vision, I have developed and tested a new feature that integrates Google's Gemini API (which can be abstracted to any compatible LLM) directly into the Calibre E-book Viewer. My aim is to empower users with in-context AI tools, removing the need to leave the reading environment. The results: capability of instant text summarization, clarification of complex topics, grammar correction, translation, and more, enhancing the reading and research experience.

Key Features Implemented:

Dockable Side Panel: persistent, dockable panel for direct interaction with selected text.
Customizable Quick Actions: grid of user-configurable buttons for one-click common tasks (e.g., "Summarize," "Explain Simply").
Custom Prompts: dedicated field for arbitrary, user-defined LLM queries.
Quick Access: includes a global keyboard shortcut (Ctrl+Shift+G) to toggle the panel's visibility, and a button on the floating text selection bar for immediate access.
Comprehensive Settings: robust settings dialog (accessible from the panel) for managing the API key, selecting LLM models, and fully customizing quick actions.

Implementation & Stability:
Stability has been part and parcel throughout development. Previous attempts with multi-file architectures or the Calibre plugin system resulted in untraceable startup crashes within the viewer process. Learning from this, I engineered the entire feature to be self-contained within src/calibre/gui2/viewer/ui.py.
All necessary UI classes (panel, settings dialogs) and API integration logic are defined locally within this file and its adjacent co-located modules (e.g., gemini_panel.py, gemini_settings.py)

Here is a screenshot of the feature in action (attached as well):

Here is a screenshot of its settings panel (also attached):

The Ask:
I have the complete, working code for ui.py (and the additional files for the panel/settings/configuration within src/calibre/gui2/viewer/) ready for review. Before proceeding with a formal pull request on GitHub, I wanted to present this proposal to the community to gauge interest and gather initial feedback. I believe this feature would be a valuable and frequently used addition to Calibre's capabilities and would be delighted to guide it through the contribution process.

Thank you for your time and consideration, amazing Calibre community.

Best regards,
Amir

08-04-2025, 04:53 AM	#1
amirthfultehrani Junior Member Posts: 9 Karma: 10 Join Date: Aug 2025 Device: Windows 11	Feature Proposal: Gemini/LLM API Integration in E-book Viewer Hello Calibre developers and community, may you all be well! As a long-time user, I admire and appreciate the continuous effort that has made Calibre the indispensable application it is today. My sincere gratitude to all involved. My gratitude aside, now a reflection on the reading experience (that many may be able to attest to). Through countless hours of reading, I've often found myself pausing, grappling with questions a page (or pages) alone cannot answer. Such unanswered questions frequently necessitates reaching for external tools - be it search engines or, increasingly, Large Language Models (LLMs) like ChatGPT - to bring the reader (me) to deeper comprehension. Think extracting the logical structure from an argument in Spinoza's Ethics or identifying experimental findings from a highly technical paper in Science magazine -- sure, one can meticulously re-read, cross-reference, and manually outline these works, to eventually get to the "answers," but an LLM offers that which traditional tools cannot provide: an immediate, context-aware synthesis, one that drastically reduces the cognitive overhead and time spent in the pursuit of genuine understanding; after all, effort saved is focus gained. To expand briefly on the point: unlike static resources that provide isolated facts or definitions (e.g., search engines or dictionaries), LLMs can take a paragraph or passage as complex as Kant's Critique of Pure Reason or as dense as a molecular biology research paper from Nature and instead synthesize its core ideas, explain complex relationships, or rephrase it into simpler term -- and all while retaining important context. It has led me to a realization that I have finally decided to act upon: while having a separate LLM window (ChatGPT, Claude, Grok, Gemini, etc.,) works, imagine the metamorphic potential if this capability were integrated directly into the Calibre application itself. With that vision, I have developed and tested a new feature that integrates Google's Gemini API (which can be abstracted to any compatible LLM) directly into the Calibre E-book Viewer. My aim is to empower users with in-context AI tools, removing the need to leave the reading environment. The results: capability of instant text summarization, clarification of complex topics, grammar correction, translation, and more, enhancing the reading and research experience. Key Features Implemented: Dockable Side Panel: persistent, dockable panel for direct interaction with selected text. Customizable Quick Actions: grid of user-configurable buttons for one-click common tasks (e.g., "Summarize," "Explain Simply"). Custom Prompts: dedicated field for arbitrary, user-defined LLM queries. Quick Access: includes a global keyboard shortcut (Ctrl+Shift+G) to toggle the panel's visibility, and a button on the floating text selection bar for immediate access. Comprehensive Settings: robust settings dialog (accessible from the panel) for managing the API key, selecting LLM models, and fully customizing quick actions. Implementation & Stability: Stability has been part and parcel throughout development. Previous attempts with multi-file architectures or the Calibre plugin system resulted in untraceable startup crashes within the viewer process. Learning from this, I engineered the entire feature to be self-contained within src/calibre/gui2/viewer/ui.py. All necessary UI classes (panel, settings dialogs) and API integration logic are defined locally within this file and its adjacent co-located modules (e.g., gemini_panel.py, gemini_settings.py) Here is a screenshot of the feature in action (attached as well): Here is a screenshot of its settings panel (also attached): The Ask: I have the complete, working code for ui.py (and the additional files for the panel/settings/configuration within src/calibre/gui2/viewer/) ready for review. Before proceeding with a formal pull request on GitHub, I wanted to present this proposal to the community to gauge interest and gather initial feedback. I believe this feature would be a valuable and frequently used addition to Calibre's capabilities and would be delighted to guide it through the contribution process. Thank you for your time and consideration, amazing Calibre community. Best regards, Amir Attached Thumbnails