Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > KOReader

Notices

Reply
 
Thread Tools Search this Thread
Old Yesterday, 07:32 AM   #1
LegeApps
Junior Member
LegeApps began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Jun 2026
Device: Kobo Clara HD
Post Selective binarization and re-encoding program for use with KoReader

Hi all, this is a program I have been working on since early last year.

Problem: you get a scan from your physical scanner or from Archive.org, the file is 500+mb and the scanned pages are yellowed and worn, making the contrast look wrong on a BW e-ink reader.

Solution: intelligent binarization of such raster scans, and re-encoding to 1bit fax format for 90% file size reduction at a resolution you select, with final PDF or DjVu formats as well as EPUB, possible (EPUB is actually processed differently, using intensive OCR).

I made the program to work directly with my Kobo Clara HD + KoReader which supports PDF/JBIG2 and DjVu.

Features:

GUI and a separate command line interface. page range selection, center/crop margins, or Reflow (similar to K2PDFOPT but much faster). 2 modes of OCR available. Many specialized debug options in the CLI. Custom ONNXRuntime engine, custom JBIG2, JP2, and DjVu encoders.

Everything works fast and automatic, unlike ScanTailor et al. Opensource.

Let me know what you think here, or email read@legeapp.com, with bug reports or suggestions.

www.legeapp.com

https://apps.microsoft.com/detail/9N...&ocid=pdpshare

https://github.com/LegeApp/Lege
LegeApps is offline   Reply With Quote
Old Yesterday, 11:18 AM   #2
Frenzie
Wizard
Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.
 
Posts: 1,825
Karma: 731691
Join Date: Oct 2014
Location: Antwerp
Device: Kobo Aura H2O, Kobo Libra 2
That's pretty neat. Is the Claude-assisted code any good or total spaghetti?
Frenzie is offline   Reply With Quote
Advert
Old Yesterday, 02:01 PM   #3
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 83,895
Karma: 153649587
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Why is it in the Microsoft link, the Lege side of the sample looks like a bad photocopy?
JSWolf is offline   Reply With Quote
Old Yesterday, 02:04 PM   #4
mergen3107
Wizard
mergen3107 ought to be getting tired of karma fortunes by now.mergen3107 ought to be getting tired of karma fortunes by now.mergen3107 ought to be getting tired of karma fortunes by now.mergen3107 ought to be getting tired of karma fortunes by now.mergen3107 ought to be getting tired of karma fortunes by now.mergen3107 ought to be getting tired of karma fortunes by now.mergen3107 ought to be getting tired of karma fortunes by now.mergen3107 ought to be getting tired of karma fortunes by now.mergen3107 ought to be getting tired of karma fortunes by now.mergen3107 ought to be getting tired of karma fortunes by now.mergen3107 ought to be getting tired of karma fortunes by now.
 
mergen3107's Avatar
 
Posts: 1,612
Karma: 5000564
Join Date: Feb 2012
Location: Cape Canaveral
Device: Kindle Scribe
Err, because it is a bad photocopy? This is the whole point of the app
mergen3107 is offline   Reply With Quote
Old Yesterday, 02:23 PM   #5
rantanplan
Weirdo
rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.
 
Posts: 1,129
Karma: 12503116
Join Date: Nov 2019
Location: Wuppertal, Germany
Device: Kobo Sage, Kobo Libra 2, reMarkable PaperPro
Quote:
Originally Posted by Frenzie View Post
That's pretty neat. Is the Claude-assisted code any good or total spaghetti?
There one way to find out...
rantanplan is online now   Reply With Quote
Advert
Old Yesterday, 02:41 PM   #6
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 83,895
Karma: 153649587
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by mergen3107 View Post
Err, because it is a bad photocopy? This is the whole point of the app
There there is no point to the app.
JSWolf is offline   Reply With Quote
Old Yesterday, 02:41 PM   #7
rantanplan
Weirdo
rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.
 
Posts: 1,129
Karma: 12503116
Join Date: Nov 2019
Location: Wuppertal, Germany
Device: Kobo Sage, Kobo Libra 2, reMarkable PaperPro
They failed at basic testing and just assumed that I had a required packed installed in fixed directory. Sigh.

Click image for larger version

Name:	Bildschirmfoto 2026-06-25 um 20.39.54.png
Views:	10
Size:	171.5 KB
ID:	224035
rantanplan is online now   Reply With Quote
Old Yesterday, 02:49 PM   #8
rantanplan
Weirdo
rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.
 
Posts: 1,129
Karma: 12503116
Join Date: Nov 2019
Location: Wuppertal, Germany
Device: Kobo Sage, Kobo Libra 2, reMarkable PaperPro
And another dependency that they assumed I have installed, rdf. Really sloppy.
rantanplan is online now   Reply With Quote
Old Yesterday, 02:51 PM   #9
rantanplan
Weirdo
rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.
 
Posts: 1,129
Karma: 12503116
Join Date: Nov 2019
Location: Wuppertal, Germany
Device: Kobo Sage, Kobo Libra 2, reMarkable PaperPro
And more...

Click image for larger version

Name:	Bildschirmfoto 2026-06-25 um 20.52.27.png
Views:	6
Size:	512.6 KB
ID:	224037

At this point it feels more like debugging and I'll stop testing now.
rantanplan is online now   Reply With Quote
Old Yesterday, 11:14 PM   #10
LegeApps
Junior Member
LegeApps began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Jun 2026
Device: Kobo Clara HD
Quote:
Originally Posted by Frenzie View Post
That's pretty neat. Is the Claude-assisted code any good or total spaghetti?
Don't worry about the code unless you want to help with the project. Otherwise you're an end user, give me end user feedback.

Quote:
Why is it in the Microsoft link, the Lege side of the sample looks like a bad photocopy?
The program binarizes pages to 1 bit color. that means black and white pixels, from 16 million colors. Originally the plan was to whiten colors at 16 million color space but that proved to be prone to error and unworkable, so now it binarizes. Then I discovered that binarization is an entire sub-field unto itself. DIBCO was a yearly competition among researchers to see who could binarize old texts the best for research purposes. The program uses some of the methods that did well in those contests. the fixed and the adaptive binarization have parameters that can be changed for your document.

Quote:
There there is no point to the app.
What do you wish it did? There isn't another app that does a better job. ScanTailor also binarizes. as does k2pfopt and pdf-exchange. they came to the same conclusions about the infeasibility of working in color space.

Quote:
They failed at basic testing and just assumed that I had a required packed installed in fixed directory. Sigh.
Unless you want to help with the project there is no point to compiling from source. If you did want to help, you would already know how to fix those issues; the GUI is a local fork of Freya for various things I needed it to do, as described in the cargo.toml; you can find that fork at https://github.com/LegeApp/freya. Otherwise, download a release from here - https://github.com/LegeApp/Lege/releases I take the time to package all the files the program needs together; compiling the binary isn't enough to get the program to run. The leptonica error is because you don't have Tesseract installed, which it says you need in the readme on the github page.

I did use LLMs to make it but the releases are solid and none of it is sloppy. It is open source software and free, if you want to help, let me know. Otherwise download a release, use it like any other program, and leave the source code to developers.
LegeApps is offline   Reply With Quote
Old Today, 12:52 AM   #11
LegeApps
Junior Member
LegeApps began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Jun 2026
Device: Kobo Clara HD
Hi all, now the github repo will build as-is after clone. However you still need the files in the Release zips to get the program to work, if you modify the binary for some purpose of yours.

Otherwise there is no reason to build the binaries from source since the most recent versions are included in the Releases, and the ONNX files, pdfium library and other files are needed to run the program. Continue to let me know what you think.

I use the program to make my own prepared e-ink files and it works great for me at this point but there's always some improvement or change that could be made.

Will update soon for final macos github clone compatibility, currently it only supports windows and linux but macos would be a quick tweak (pdfium library detection).
LegeApps is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Selective connections to Web rmanlee Library Management 4 02-02-2014 03:04 AM
Selective paragraph indent Leonatus Writer2ePub 8 10-31-2013 04:22 PM
Selective preprocess_regexps dasp Recipes 3 12-06-2011 08:52 AM
Selective format conversion? drmathprog Library Management 2 04-19-2011 08:43 AM
Selective exclusion of Hyperlinks SteffenH Sony Reader 4 10-03-2007 06:51 AM


All times are GMT -4. The time now is 09:34 AM.


MobileRead.com is a privately owned, operated and funded community.