Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > KOReader

Notices

Reply
 
Thread Tools Search this Thread
Old 06-25-2026, 07:32 AM   #1
LegeApps
Junior Member
LegeApps began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Jun 2026
Device: Kobo Clara HD
Post Selective binarization and re-encoding program for use with KoReader

Hi all, this is a program I have been working on since early last year.

Problem: you get a scan from your physical scanner or from Archive.org, the file is 500+mb and the scanned pages are yellowed and worn, making the contrast look wrong on a BW e-ink reader.

Solution: intelligent binarization of such raster scans, and re-encoding to 1bit fax format for 90% file size reduction at a resolution you select, with final PDF or DjVu formats as well as EPUB, possible (EPUB is actually processed differently, using intensive OCR).

I made the program to work directly with my Kobo Clara HD + KoReader which supports PDF/JBIG2 and DjVu.

Features:

GUI and a separate command line interface. page range selection, center/crop margins, or Reflow (similar to K2PDFOPT but much faster). 2 modes of OCR available. Many specialized debug options in the CLI. Custom ONNXRuntime engine, custom JBIG2, JP2, and DjVu encoders.

Everything works fast and automatic, unlike ScanTailor et al. Opensource.

Let me know what you think here, or email read@legeapp.com, with bug reports or suggestions.

www.legeapp.com

https://apps.microsoft.com/detail/9N...&ocid=pdpshare

https://github.com/LegeApp/Lege
LegeApps is offline   Reply With Quote
Old 06-25-2026, 11:18 AM   #2
Frenzie
Wizard
Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.
 
Posts: 1,825
Karma: 731691
Join Date: Oct 2014
Location: Antwerp
Device: Kobo Aura H2O, Kobo Libra 2
That's pretty neat. Is the Claude-assisted code any good or total spaghetti?
Frenzie is offline   Reply With Quote
Old 06-25-2026, 02:01 PM   #3
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 83,912
Karma: 153649587
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Why is it in the Microsoft link, the Lege side of the sample looks like a bad photocopy?
JSWolf is offline   Reply With Quote
Old 06-25-2026, 02:04 PM   #4
mergen3107
Wizard
mergen3107 ought to be getting tired of karma fortunes by now.mergen3107 ought to be getting tired of karma fortunes by now.mergen3107 ought to be getting tired of karma fortunes by now.mergen3107 ought to be getting tired of karma fortunes by now.mergen3107 ought to be getting tired of karma fortunes by now.mergen3107 ought to be getting tired of karma fortunes by now.mergen3107 ought to be getting tired of karma fortunes by now.mergen3107 ought to be getting tired of karma fortunes by now.mergen3107 ought to be getting tired of karma fortunes by now.mergen3107 ought to be getting tired of karma fortunes by now.mergen3107 ought to be getting tired of karma fortunes by now.
 
mergen3107's Avatar
 
Posts: 1,616
Karma: 5000564
Join Date: Feb 2012
Location: Cape Canaveral
Device: Kindle Scribe
Err, because it is a bad photocopy? This is the whole point of the app
mergen3107 is offline   Reply With Quote
Old 06-25-2026, 02:23 PM   #5
rantanplan
Weirdo
rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.
 
Posts: 1,132
Karma: 12503116
Join Date: Nov 2019
Location: Wuppertal, Germany
Device: Kobo Sage, Kobo Libra 2, reMarkable PaperPro
Quote:
Originally Posted by Frenzie View Post
That's pretty neat. Is the Claude-assisted code any good or total spaghetti?
There one way to find out...
rantanplan is offline   Reply With Quote
Old 06-25-2026, 02:41 PM   #6
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 83,912
Karma: 153649587
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by mergen3107 View Post
Err, because it is a bad photocopy? This is the whole point of the app
There there is no point to the app.
JSWolf is offline   Reply With Quote
Old 06-25-2026, 02:41 PM   #7
rantanplan
Weirdo
rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.
 
Posts: 1,132
Karma: 12503116
Join Date: Nov 2019
Location: Wuppertal, Germany
Device: Kobo Sage, Kobo Libra 2, reMarkable PaperPro
They failed at basic testing and just assumed that I had a required packed installed in fixed directory. Sigh.

Click image for larger version

Name:	Bildschirmfoto 2026-06-25 um 20.39.54.png
Views:	20
Size:	171.5 KB
ID:	224035
rantanplan is offline   Reply With Quote
Old 06-25-2026, 02:49 PM   #8
rantanplan
Weirdo
rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.
 
Posts: 1,132
Karma: 12503116
Join Date: Nov 2019
Location: Wuppertal, Germany
Device: Kobo Sage, Kobo Libra 2, reMarkable PaperPro
And another dependency that they assumed I have installed, rdf. Really sloppy.
rantanplan is offline   Reply With Quote
Old 06-25-2026, 02:51 PM   #9
rantanplan
Weirdo
rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.
 
Posts: 1,132
Karma: 12503116
Join Date: Nov 2019
Location: Wuppertal, Germany
Device: Kobo Sage, Kobo Libra 2, reMarkable PaperPro
And more...

Click image for larger version

Name:	Bildschirmfoto 2026-06-25 um 20.52.27.png
Views:	13
Size:	512.6 KB
ID:	224037

At this point it feels more like debugging and I'll stop testing now.
rantanplan is offline   Reply With Quote
Old 06-25-2026, 11:14 PM   #10
LegeApps
Junior Member
LegeApps began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Jun 2026
Device: Kobo Clara HD
Quote:
Originally Posted by Frenzie View Post
That's pretty neat. Is the Claude-assisted code any good or total spaghetti?
Don't worry about the code unless you want to help with the project. Otherwise you're an end user, give me end user feedback.

Quote:
Why is it in the Microsoft link, the Lege side of the sample looks like a bad photocopy?
The program binarizes pages to 1 bit color. that means black and white pixels, from 16 million colors. Originally the plan was to whiten colors at 16 million color space but that proved to be prone to error and unworkable, so now it binarizes. Then I discovered that binarization is an entire sub-field unto itself. DIBCO was a yearly competition among researchers to see who could binarize old texts the best for research purposes. The program uses some of the methods that did well in those contests. the fixed and the adaptive binarization have parameters that can be changed for your document.

Quote:
There there is no point to the app.
What do you wish it did? There isn't another app that does a better job. ScanTailor also binarizes. as does k2pfopt and pdf-exchange. they came to the same conclusions about the infeasibility of working in color space.

Quote:
They failed at basic testing and just assumed that I had a required packed installed in fixed directory. Sigh.
Unless you want to help with the project there is no point to compiling from source. If you did want to help, you would already know how to fix those issues; the GUI is a local fork of Freya for various things I needed it to do, as described in the cargo.toml; you can find that fork at https://github.com/LegeApp/freya. Otherwise, download a release from here - https://github.com/LegeApp/Lege/releases I take the time to package all the files the program needs together; compiling the binary isn't enough to get the program to run. The leptonica error is because you don't have Tesseract installed, which it says you need in the readme on the github page.

I did use LLMs to make it but the releases are solid and none of it is sloppy. It is open source software and free, if you want to help, let me know. Otherwise download a release, use it like any other program, and leave the source code to developers.
LegeApps is offline   Reply With Quote
Old 06-26-2026, 12:52 AM   #11
LegeApps
Junior Member
LegeApps began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Jun 2026
Device: Kobo Clara HD
Hi all, now the github repo will build as-is after clone. However you still need the files in the Release zips to get the program to work, if you modify the binary for some purpose of yours.

Otherwise there is no reason to build the binaries from source since the most recent versions are included in the Releases, and the ONNX files, pdfium library and other files are needed to run the program. Continue to let me know what you think.

I use the program to make my own prepared e-ink files and it works great for me at this point but there's always some improvement or change that could be made.

Will update soon for final macos github clone compatibility, currently it only supports windows and linux but macos would be a quick tweak (pdfium library detection).
LegeApps is offline   Reply With Quote
Old 06-26-2026, 06:46 PM   #12
embryo
Fanatic
embryo calls his or her ebook reader Vera.embryo calls his or her ebook reader Vera.embryo calls his or her ebook reader Vera.embryo calls his or her ebook reader Vera.embryo calls his or her ebook reader Vera.embryo calls his or her ebook reader Vera.embryo calls his or her ebook reader Vera.embryo calls his or her ebook reader Vera.embryo calls his or her ebook reader Vera.embryo calls his or her ebook reader Vera.embryo calls his or her ebook reader Vera.
 
embryo's Avatar
 
Posts: 533
Karma: 64554
Join Date: Aug 2013
Device: Kobo Glo, GloHD
Thank you for your app. It seems really handy.

Some feedback/observations:

This dropdown menu is cropped by the size of the window..


How is the selection of color parts is made?
In some pages 2 out of 3 images can be colored or any combination can happen (all color, all BW). Can I avoid it for some pages or for the whole file?

Although there is no drag'n'drop mouse block on the app's window, dropping a pdf from explorer is not supported.

I'll play with it some more..
Attached Thumbnails
Click image for larger version

Name:	bug 1.png
Views:	51
Size:	99.6 KB
ID:	224058   Click image for larger version

Name:	bug 2.png
Views:	49
Size:	374.1 KB
ID:	224059  

Last edited by embryo; 06-26-2026 at 06:56 PM.
embryo is offline   Reply With Quote
Old 06-26-2026, 11:27 PM   #13
LegeApps
Junior Member
LegeApps began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Jun 2026
Device: Kobo Clara HD
Quote:
Originally Posted by embryo View Post
Thank you for your app. It seems really handy.

Some feedback/observations:

This dropdown menu is cropped by the size of the window..

How is the selection of color parts is made?
In some pages 2 out of 3 images can be colored or any combination can happen (all color, all BW). Can I avoid it for some pages or for the whole file?

Although there is no drag'n'drop mouse block on the app's window, dropping a pdf from explorer is not supported.

I'll play with it some more..
Hey thanks for your feedback and i am working on improvements based on it now. I am sorry that the menu stopped working correctly, from recent changes, so I am just going to get rid of it entirely and make target resolution setting entirely manually set, with 1200 default. Also those menu choices can get dated fast. I may consider a preset save option in its place.

Yes you can avoid the image detection, just turn layout detection off and it won't use the ONNX model to run layout detection per page. Disabling layout detection on some pages is a good idea, it's just a question of overcomplicating the GUI but i will consider how to do it.

The incorrect detection on that page is because of the YOLO model seeing an expanded detection. The YOLO model struggles with full page detections, the Paddle model i was using before fared better but it is very slow with WGPU so I had replaced it. Not sure what to do about that, if you want to send more examples of bad detections that would help, or a PDF that it messes up on. Otherwise I'll figure it out.

Drag and drop support added, will be uploaded later. Check out the CLI also which is actually easier to use overall.
LegeApps is offline   Reply With Quote
Old Yesterday, 10:44 AM   #14
LegeApps
Junior Member
LegeApps began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Jun 2026
Device: Kobo Clara HD
Those changes have been made in the 3 most recent commits and the Linux universal release has been updated. a few more GUI tweaks additionally. The page exclusion didn't make it into the GUI, I tested it and it never looked right so it's in the CLI only. But now EPUB creation is in the GUI, was CLI only before. Windows releases not updated yet.

For the record the Windows version is faster and better; Directx12 is superior to Vulkan, and WinOCR is faster and more accurate than Tesseract. Those are the kinds of tradeoffs you have to make when designing a cross OS program. Appreciate all the input so far.
LegeApps is offline   Reply With Quote
Old Yesterday, 12:38 PM   #15
embryo
Fanatic
embryo calls his or her ebook reader Vera.embryo calls his or her ebook reader Vera.embryo calls his or her ebook reader Vera.embryo calls his or her ebook reader Vera.embryo calls his or her ebook reader Vera.embryo calls his or her ebook reader Vera.embryo calls his or her ebook reader Vera.embryo calls his or her ebook reader Vera.embryo calls his or her ebook reader Vera.embryo calls his or her ebook reader Vera.embryo calls his or her ebook reader Vera.
 
embryo's Avatar
 
Posts: 533
Karma: 64554
Join Date: Aug 2013
Device: Kobo Glo, GloHD
My second try was with another pdf that never finished processing.
The 1st time it run, it did process it until a point ~70%, but then it just disappeared.
No trace in the log (BTW, can we have an option to clear the log?), no error dialog, no nothing.
Every other try I made with the same file did a very fast run up to ~70% and disappeared too.
After a windows restart I got the same normal processing for a few moments at the 1st try, and then the disappearance again, followed by very fast runs with every other try after that.

I'll try some other pdf later..
embryo is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Selective connections to Web rmanlee Library Management 4 02-02-2014 03:04 AM
Selective paragraph indent Leonatus Writer2ePub 8 10-31-2013 04:22 PM
Selective preprocess_regexps dasp Recipes 3 12-06-2011 08:52 AM
Selective format conversion? drmathprog Library Management 2 04-19-2011 08:43 AM
Selective exclusion of Hyperlinks SteffenH Sony Reader 4 10-03-2007 06:51 AM


All times are GMT -4. The time now is 08:40 AM.


MobileRead.com is a privately owned, operated and funded community.