|
|||||||
![]() |
|
|
Thread Tools | Search this Thread |
|
|
#1 |
|
Junior Member
![]() Posts: 6
Karma: 10
Join Date: Jun 2026
Device: Kobo Clara HD
|
Hi all, this is a program I have been working on since early last year.
Problem: you get a scan from your physical scanner or from Archive.org, the file is 500+mb and the scanned pages are yellowed and worn, making the contrast look wrong on a BW e-ink reader. Solution: intelligent binarization of such raster scans, and re-encoding to 1bit fax format for 90% file size reduction at a resolution you select, with final PDF or DjVu formats as well as EPUB, possible (EPUB is actually processed differently, using intensive OCR). I made the program to work directly with my Kobo Clara HD + KoReader which supports PDF/JBIG2 and DjVu. Features: GUI and a separate command line interface. page range selection, center/crop margins, or Reflow (similar to K2PDFOPT but much faster). 2 modes of OCR available. Many specialized debug options in the CLI. Custom ONNXRuntime engine, custom JBIG2, JP2, and DjVu encoders. Everything works fast and automatic, unlike ScanTailor et al. Opensource. Let me know what you think here, or email read@legeapp.com, with bug reports or suggestions. www.legeapp.com https://apps.microsoft.com/detail/9N...&ocid=pdpshare https://github.com/LegeApp/Lege |
|
|
|
|
|
#2 |
|
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,825
Karma: 731691
Join Date: Oct 2014
Location: Antwerp
Device: Kobo Aura H2O, Kobo Libra 2
|
That's pretty neat. Is the Claude-assisted code any good or total spaghetti?
|
|
|
|
|
|
#3 |
|
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 83,912
Karma: 153649587
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Why is it in the Microsoft link, the Lege side of the sample looks like a bad photocopy?
|
|
|
|
|
|
#4 |
|
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,616
Karma: 5000564
Join Date: Feb 2012
Location: Cape Canaveral
Device: Kindle Scribe
|
Err, because it is a bad photocopy? This is the whole point of the app
|
|
|
|
|
|
#5 |
|
Weirdo
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,132
Karma: 12503116
Join Date: Nov 2019
Location: Wuppertal, Germany
Device: Kobo Sage, Kobo Libra 2, reMarkable PaperPro
|
|
|
|
|
|
|
#6 |
|
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 83,912
Karma: 153649587
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
|
|
|
|
|
|
#7 |
|
Weirdo
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,132
Karma: 12503116
Join Date: Nov 2019
Location: Wuppertal, Germany
Device: Kobo Sage, Kobo Libra 2, reMarkable PaperPro
|
They failed at basic testing and just assumed that I had a required packed installed in fixed directory. Sigh.
|
|
|
|
|
|
#8 |
|
Weirdo
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,132
Karma: 12503116
Join Date: Nov 2019
Location: Wuppertal, Germany
Device: Kobo Sage, Kobo Libra 2, reMarkable PaperPro
|
And another dependency that they assumed I have installed, rdf. Really sloppy.
|
|
|
|
|
|
#9 |
|
Weirdo
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,132
Karma: 12503116
Join Date: Nov 2019
Location: Wuppertal, Germany
Device: Kobo Sage, Kobo Libra 2, reMarkable PaperPro
|
|
|
|
|
|
|
#10 | ||||
|
Junior Member
![]() Posts: 6
Karma: 10
Join Date: Jun 2026
Device: Kobo Clara HD
|
Quote:
Quote:
Quote:
Quote:
I did use LLMs to make it but the releases are solid and none of it is sloppy. It is open source software and free, if you want to help, let me know. Otherwise download a release, use it like any other program, and leave the source code to developers. |
||||
|
|
|
|
|
#11 |
|
Junior Member
![]() Posts: 6
Karma: 10
Join Date: Jun 2026
Device: Kobo Clara HD
|
Hi all, now the github repo will build as-is after clone. However you still need the files in the Release zips to get the program to work, if you modify the binary for some purpose of yours.
Otherwise there is no reason to build the binaries from source since the most recent versions are included in the Releases, and the ONNX files, pdfium library and other files are needed to run the program. Continue to let me know what you think. I use the program to make my own prepared e-ink files and it works great for me at this point but there's always some improvement or change that could be made. Will update soon for final macos github clone compatibility, currently it only supports windows and linux but macos would be a quick tweak (pdfium library detection). |
|
|
|
|
|
#12 |
|
Fanatic
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 533
Karma: 64554
Join Date: Aug 2013
Device: Kobo Glo, GloHD
|
Thank you for your app. It seems really handy.
Some feedback/observations: This dropdown menu is cropped by the size of the window.. How is the selection of color parts is made? In some pages 2 out of 3 images can be colored or any combination can happen (all color, all BW). Can I avoid it for some pages or for the whole file? Although there is no drag'n'drop mouse block on the app's window, dropping a pdf from explorer is not supported. I'll play with it some more.. Last edited by embryo; 06-26-2026 at 06:56 PM. |
|
|
|
|
|
#13 | |
|
Junior Member
![]() Posts: 6
Karma: 10
Join Date: Jun 2026
Device: Kobo Clara HD
|
Quote:
Yes you can avoid the image detection, just turn layout detection off and it won't use the ONNX model to run layout detection per page. Disabling layout detection on some pages is a good idea, it's just a question of overcomplicating the GUI but i will consider how to do it. The incorrect detection on that page is because of the YOLO model seeing an expanded detection. The YOLO model struggles with full page detections, the Paddle model i was using before fared better but it is very slow with WGPU so I had replaced it. Not sure what to do about that, if you want to send more examples of bad detections that would help, or a PDF that it messes up on. Otherwise I'll figure it out. Drag and drop support added, will be uploaded later. Check out the CLI also which is actually easier to use overall. |
|
|
|
|
|
|
#14 |
|
Junior Member
![]() Posts: 6
Karma: 10
Join Date: Jun 2026
Device: Kobo Clara HD
|
Those changes have been made in the 3 most recent commits and the Linux universal release has been updated. a few more GUI tweaks additionally. The page exclusion didn't make it into the GUI, I tested it and it never looked right so it's in the CLI only. But now EPUB creation is in the GUI, was CLI only before. Windows releases not updated yet.
For the record the Windows version is faster and better; Directx12 is superior to Vulkan, and WinOCR is faster and more accurate than Tesseract. Those are the kinds of tradeoffs you have to make when designing a cross OS program. Appreciate all the input so far. |
|
|
|
|
|
#15 |
|
Fanatic
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 533
Karma: 64554
Join Date: Aug 2013
Device: Kobo Glo, GloHD
|
My second try was with another pdf that never finished processing.
The 1st time it run, it did process it until a point ~70%, but then it just disappeared. No trace in the log (BTW, can we have an option to clear the log?), no error dialog, no nothing. Every other try I made with the same file did a very fast run up to ~70% and disappeared too. After a windows restart I got the same normal processing for a few moments at the 1st try, and then the disappearance again, followed by very fast runs with every other try after that. I'll try some other pdf later.. |
|
|
|
![]() |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Selective connections to Web | rmanlee | Library Management | 4 | 02-02-2014 03:04 AM |
| Selective paragraph indent | Leonatus | Writer2ePub | 8 | 10-31-2013 04:22 PM |
| Selective preprocess_regexps | dasp | Recipes | 3 | 12-06-2011 08:52 AM |
| Selective format conversion? | drmathprog | Library Management | 2 | 04-19-2011 08:43 AM |
| Selective exclusion of Hyperlinks | SteffenH | Sony Reader | 4 | 10-03-2007 06:51 AM |