Quote:
Originally Posted by kollo
 For security and stability, I have purchased a new domain name. Please visit the new website: https://ebookcc.com
|

Hello everyone,
I wanted to share a major update regarding EbookCC, the browser-based tool designed to detect, translate, and overlay selectable text onto comic speech bubbles. Thanks for all the initial thoughts and interest!
Here is what is new in the latest version:
📂 1.Open Source!
https://github.com/PP002/ebookcc
EbookCC is now fully open source and hosted on GitHub. You can check out the source code, run it locally on your machine, or contribute directly to the repository. Running it locally completely bypasses any cloud network restrictions, letting you hook up your tools with zero delay.
Try the Cloud Version:
https://ebookcc.com/
💻 2. Full Local Model Support (Ollama & LM Studio)
You can now run EbookCC entirely offline and privately using standard self-hosted APIs on your computer:
LM Studio & Ollama Integrations: Simply plug in your local URL (like
http://127.0.0.1:1234/v1 or
http://127.0.0.1:11434/v1). EbookCC connects to them directly inside your browser so it is incredibly fast and completely secure.
CORS & Connection Guidance: We have integrated friendly setup checklists directly into the settings window to guide you through environment setups (such as setting OLLAMA_ORIGINS="*").
⚡ 3. Gemini Context Caching for Speed and Savings
For those using the cloud API, I have introduced Gemini Context Caching. Because comic books contain consistent panel contexts, caching previous translation templates saves up to 50–80% on token overhead and optimizes response latency.
👥 4. Direct Feedback & Local Model Recommendations
I have been testing local setups with lightweight models like Gemma 4B / Gemma 2, which perform excellently for language translation steps! However, smaller 4B-8B quantized weights sometimes lose a bit of spatial/multimodal coordinate precision for OCR bubble-finding compared to giant cloud APIs.
I need your feedback: If you are testing local workflows, please let me know which models are working best for you!
Are you using llama3.2-vision, qwen2.5-vision, or something else? I would love to hear your recommendations on a solid local model that strikes a great balance between text recognition and speed on consumer GPUs.
Thank you all for the support. I look forward to hearing your thoughts!