Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > KOReader

Notices

Reply
 
Thread Tools Search this Thread
Old 10-19-2022, 11:04 AM   #1
MaxStirner
Connoisseur
MaxStirner has a certain pleonastic somethingMaxStirner has a certain pleonastic somethingMaxStirner has a certain pleonastic somethingMaxStirner has a certain pleonastic somethingMaxStirner has a certain pleonastic somethingMaxStirner has a certain pleonastic somethingMaxStirner has a certain pleonastic somethingMaxStirner has a certain pleonastic somethingMaxStirner has a certain pleonastic somethingMaxStirner has a certain pleonastic somethingMaxStirner has a certain pleonastic something
 
Posts: 71
Karma: 18500
Join Date: Apr 2013
Device: Kindle Touch, Paperwhite
Koreader is poor in handling Internet Archive books

And it seems to be non device speciffic. Doesn't matter whether it is a flagship phone, Kobo Aura, Kindle pw 4 Hangs up or crashes pretty much everytime I open one of these books. Can anything be done about it?

Last edited by MaxStirner; 10-19-2022 at 11:58 AM.
MaxStirner is offline   Reply With Quote
Old 10-20-2022, 07:05 AM   #2
pazos
cosiñeiro
pazos ought to be getting tired of karma fortunes by now.pazos ought to be getting tired of karma fortunes by now.pazos ought to be getting tired of karma fortunes by now.pazos ought to be getting tired of karma fortunes by now.pazos ought to be getting tired of karma fortunes by now.pazos ought to be getting tired of karma fortunes by now.pazos ought to be getting tired of karma fortunes by now.pazos ought to be getting tired of karma fortunes by now.pazos ought to be getting tired of karma fortunes by now.pazos ought to be getting tired of karma fortunes by now.pazos ought to be getting tired of karma fortunes by now.
 
Posts: 1,406
Karma: 2451781
Join Date: Apr 2014
Device: BQ Cervantes 4
Quote:
Originally Posted by MaxStirner View Post
And it seems to be non device speciffic. Doesn't matter whether it is a flagship phone, Kobo Aura, Kindle pw 4 Hangs up or crashes pretty much everytime I open one of these books. Can anything be done about it?
Internet Archive Books is not a mimetype the app understand. I'm assuming you're talking about epubs.

In that case please put a link here pointing to one of the files that make the app hang/crash. Since you're talking about IA I'm assuming you're downloading books in the public domain.

Most probably are broken documents but it is always interesting to learn from somebody else's errors
pazos is offline   Reply With Quote
Advert
Old 10-20-2022, 04:19 PM   #3
MaxStirner
Connoisseur
MaxStirner has a certain pleonastic somethingMaxStirner has a certain pleonastic somethingMaxStirner has a certain pleonastic somethingMaxStirner has a certain pleonastic somethingMaxStirner has a certain pleonastic somethingMaxStirner has a certain pleonastic somethingMaxStirner has a certain pleonastic somethingMaxStirner has a certain pleonastic somethingMaxStirner has a certain pleonastic somethingMaxStirner has a certain pleonastic somethingMaxStirner has a certain pleonastic something
 
Posts: 71
Karma: 18500
Join Date: Apr 2013
Device: Kindle Touch, Paperwhite
No I am thinking about pdfs And even if Koreader does not ultimately crash or hang up, it takes ages tobrender a page
MaxStirner is offline   Reply With Quote
Old 10-20-2022, 04:20 PM   #4
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 79,708
Karma: 145864619
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by MaxStirner View Post
No I am thinking about pdfs And even if Koreader does not ultimately crash or hang up, it takes ages tobrender a page
Aren't Internet Archive PDF just images? If so, then that's why they are so slow.
JSWolf is offline   Reply With Quote
Old 10-20-2022, 04:32 PM   #5
MaxStirner
Connoisseur
MaxStirner has a certain pleonastic somethingMaxStirner has a certain pleonastic somethingMaxStirner has a certain pleonastic somethingMaxStirner has a certain pleonastic somethingMaxStirner has a certain pleonastic somethingMaxStirner has a certain pleonastic somethingMaxStirner has a certain pleonastic somethingMaxStirner has a certain pleonastic somethingMaxStirner has a certain pleonastic somethingMaxStirner has a certain pleonastic somethingMaxStirner has a certain pleonastic something
 
Posts: 71
Karma: 18500
Join Date: Apr 2013
Device: Kindle Touch, Paperwhite
Quote:
Originally Posted by JSWolf View Post
Aren't Internet Archive PDF just images? If so, then that's why they are so slow.
Ok, but then my reasoning is this - even if eink devices do not have the resources to manage such files (not enough ram, too weak processor etc), could the process be made somehow faster on other decices like tablets or phones? They have tons of memory and should be able to deal with that..
MaxStirner is offline   Reply With Quote
Advert
Old 10-20-2022, 05:10 PM   #6
pazos
cosiñeiro
pazos ought to be getting tired of karma fortunes by now.pazos ought to be getting tired of karma fortunes by now.pazos ought to be getting tired of karma fortunes by now.pazos ought to be getting tired of karma fortunes by now.pazos ought to be getting tired of karma fortunes by now.pazos ought to be getting tired of karma fortunes by now.pazos ought to be getting tired of karma fortunes by now.pazos ought to be getting tired of karma fortunes by now.pazos ought to be getting tired of karma fortunes by now.pazos ought to be getting tired of karma fortunes by now.pazos ought to be getting tired of karma fortunes by now.
 
Posts: 1,406
Karma: 2451781
Join Date: Apr 2014
Device: BQ Cervantes 4
Ok, so broken documents

Use any pdf reader based on Pdfium (like anything based on Chrome or mostly anything based on android). They can help as they spawn multiple threads to render a single document and can handle multiple documents spawing multiple processes (each one with multiple threads).

That doesn't fix the nature of the documents. They will be still broken, will be still slow to navigate them or to jump pages.

If you want to read them using KOReader your best bet is to convert them to djvu. Or just reprint them with ghostscript tweaking some parameters. Or maybe there's a tool that's able to fix utterly big images on them automagically or, at least, fix/convert the color space.
pazos is offline   Reply With Quote
Old 10-21-2022, 01:52 PM   #7
MaxStirner
Connoisseur
MaxStirner has a certain pleonastic somethingMaxStirner has a certain pleonastic somethingMaxStirner has a certain pleonastic somethingMaxStirner has a certain pleonastic somethingMaxStirner has a certain pleonastic somethingMaxStirner has a certain pleonastic somethingMaxStirner has a certain pleonastic somethingMaxStirner has a certain pleonastic somethingMaxStirner has a certain pleonastic somethingMaxStirner has a certain pleonastic somethingMaxStirner has a certain pleonastic something
 
Posts: 71
Karma: 18500
Join Date: Apr 2013
Device: Kindle Touch, Paperwhite
Yes, i did a quick search on Koreader issues, turns out someone has already noticed the problem, and has a possibble soulution, happy to see that I am not alone. To bad that it looks like the issue is frozen
https://github.com/koreader/koreader/issues/7992

Last edited by MaxStirner; 10-21-2022 at 01:56 PM.
MaxStirner is offline   Reply With Quote
Old 11-17-2024, 10:03 PM   #8
DanCa
Member
DanCa began at the beginning.
 
Posts: 21
Karma: 10
Join Date: Sep 2013
Device: none
Has anyone found a solution to view pdf documents from archive.org?

I have no problems with other scanned pdfs that have 10x the size, but even 10MB archive pdfs are impossibly slow to render (> 30s/page).

I tried the dejazap.js tool from in the github issue mentioned by @MaxStirner, but it does nothing and doesn't even change the document size. I tried both Mask and SMask.
The original link is dead, but the file is probably this one: https://ghostscript.com/~tor/stuff/. I had to replace DeviceGray by mupdf.ColorSpace.DeviceGray to make it run. Script is attached below:

Spoiler:

This script is supposed to remove foreground and background images. Does not do anything for me.
Run with 'mutool run dejazap.js scourgeormonthly01crui.pdf out.pdf'
mutool comes with MuPDF.
Code:
// Extract the image masks from DjVu-like PDF files and create a new monochrome
// PDF from them.
//
// This assumes that each page consists of three full page images:
//   * A full color background image.
//   * A full color foreground image.
//   * A black and white selection mask.
//
// The background image typically holds the white page color, the foreground
// image holds the ink color, and the mask selects whether the foreground ink
// or background paper shows for a given pixel.
//
// This allows the background and foreground images to be encoded with an
// algorithm where the compressor can ignore the foreground ink pixels when
// compressing the background image, and vice versa, accomplishing much higher
// compression ratios since all the high-frequency data is moved to the
// selection mask which is compressed using a black&white algorithm.
//
// Typically these files are created with JPEG2000 compression for the full
// color images, which is very slow to decompress. The selection mask is then
// compressed with JBIG2 which is also quite slow.
//
// If we create a new PDF file containing only the selection masks drawn as
// monochrome images, we can usually render these files much faster, and they
// look nicer since the muddy colors are removed and the text is nice and
// crisp.
//
// There is of course the danger of losing actual color images in the file!

if (scriptArgs.length < 2) {
	print("usage: mutool run dejavu.js input.pdf output.pdf");
	quit();
}

var bgPix = new Pixmap(mupdf.ColorSpace.DeviceGray, [0,0,1,1], false);
var fgPix = new Pixmap(mupdf.ColorSpace.DeviceGray, [0,0,1,1], false);
bgPix.clear(0);
fgPix.clear(255);

var doc = new PDFDocument(scriptArgs[0]);
var bgImg = doc.addImage(new Image(bgPix));
for (var i = 0; i < doc.countPages(); ++i) {
	var page = doc.findPage(i);
	page.Resources.XObject.forEach(function (name, xobj) {
		// var mask = xobj.Mask;
                 var mask = xobj.SMask;
		if (mask) {
			var fgImg = doc.addImage(new Image(fgPix, doc.loadImage(mask)));
			page.Resources.XObject[name] = fgImg;
			
		} else {
			page.Resources.XObject[name] = bgImg;
		}
	});
}
doc.save(scriptArgs[1], "garbage=compact,compress");
DanCa is offline   Reply With Quote
Old 11-18-2024, 07:39 AM   #9
jonnyl
Zealot
jonnyl is generous with chocolatejonnyl is generous with chocolatejonnyl is generous with chocolatejonnyl is generous with chocolatejonnyl is generous with chocolatejonnyl is generous with chocolatejonnyl is generous with chocolatejonnyl is generous with chocolatejonnyl is generous with chocolatejonnyl is generous with chocolatejonnyl is generous with chocolate
 
Posts: 135
Karma: 33084
Join Date: Jan 2021
Device: Likebook Mars
I read lots of PDFs from archive.org. I always pre-process them using K2pdfopt (https://www.willus.com/k2pdfopt/). File size will usually be larger, but they will load quickly and in a format more friendly for ereader screens. There's a bit of a learning curve though to get optimal parameters (depends on source document, target device screen size, and your preferences).
jonnyl is offline   Reply With Quote
Old 11-18-2024, 09:22 AM   #10
nezih
Enthusiast
nezih is less competitive than you.nezih is less competitive than you.nezih is less competitive than you.nezih is less competitive than you.nezih is less competitive than you.nezih is less competitive than you.nezih is less competitive than you.nezih is less competitive than you.nezih is less competitive than you.nezih is less competitive than you.nezih is less competitive than you.
 
nezih's Avatar
 
Posts: 43
Karma: 14828
Join Date: Feb 2023
Device: Boox Page, Kobo Aura SE
I think the problem is related to image encoding of Archive.org PDF files. KOReader, or MuPdf, chokes on JPEG2000 encoded PDF files. Similar report from Sumatra: https://github.com/sumatrapdfreader/...df/issues/1922

One can export every page to PNG and then re-combine all the files to a PDF or use Finereader to accomplish this multi-step task.
nezih is offline   Reply With Quote
Old 11-18-2024, 01:25 PM   #11
Frenzie
Wizard
Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.
 
Posts: 1,750
Karma: 730681
Join Date: Oct 2014
Location: Antwerp
Device: Kobo Aura H2O
Quote:
Has anyone found a solution to view pdf documents from archive.org?
Where it's available, download the DjVu instead if you intend to use it on an ereader.

Quote:
I think the problem is related to image encoding of Archive.org PDF files. KOReader, or MuPdf, chokes on JPEG2000 encoded PDF files.
It's also simply a gigantic image.
Frenzie is offline   Reply With Quote
Old 11-18-2024, 08:38 PM   #12
DanCa
Member
DanCa began at the beginning.
 
Posts: 21
Karma: 10
Join Date: Sep 2013
Device: none
So here are my results so far:

- I have not managed to get anything useful out of the mutools script. This seems to be the way to go, just replace the layer. pdfimages -list clearly shows the 3 layers, 2 rgb and one gray, but the script in the two versions does absolutely nothing.

- Acrobat Pro does not detect any background images or layers

These are the file sizes, render times for the different approaches:

* Original file, 11MB, 1min/page or crash
* Original file, printed through ClawPDF driver, 172 MB, 1s/page
* Original file, printed through Windows save as pdf, 100MB, 1s/page
* Original file ran through k2pdfopt, default options or with - colorbg ffffff (which doesn't do anything, 72 MB, 1s/page + nasty dot pattern
* djvu pdf2djvu --monochrome, 52 MB, 1s/page, nasty dithering artifacts
* djvu pdf2djvu, 15 MB, 3s/page
* djvu pdf2djvu.com, 11 MB, 40s/page
* mutools version, 11MB, 1min/page or crash
* no djvu available on archive.org

It's scary that a 172 MB pdf from ClawPDF renders much faster than the 11MB archive.org original.

Obviously none of the methods above removed the background from the scan.
I've found some brute force methods using ImageMagick [1], but that seems like the wrong approach. The pdf is already split into layers with OCR'ed text, it doesn't make sense to me to flatten the pdf, save each page as an individual image, use a tool to try to separate the background from the text, do OCR, and then put everything back together. All of this to deal with lousy archive.org pdfs.

[1] https://old.reddit.com/r/kindlescrib...slg/?context=3
DanCa is offline   Reply With Quote
Old 11-20-2024, 06:56 AM   #13
Frenzie
Wizard
Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.
 
Posts: 1,750
Karma: 730681
Join Date: Oct 2014
Location: Antwerp
Device: Kobo Aura H2O
Quote:
but the script in the two versions does absolutely nothing.
Define nothing. It comes out black for me, which isn't nothing.

Swapping the colors in the .clear() commands and changing the mask to SMask takes care of that, and it is indeed a lot faster. Getting rid of the paper also improves render quality on eink.

Code:
if (scriptArgs.length < 2) {
	print("usage: mutool run dejavu.js input.pdf output.pdf");
	quit();
}

var bgPix = new Pixmap(DeviceGray, [0,0,1,1], false);
var fgPix = new Pixmap(DeviceGray, [0,0,1,1], false);
bgPix.clear(255);
fgPix.clear(0);

var doc = new PDFDocument(scriptArgs[0]);
var bgImg = doc.addImage(new Image(bgPix));
for (var i = 0; i < doc.countPages(); ++i) {
	var page = doc.findPage(i);
	page.Resources.XObject.forEach(function (name, xobj) {
		var mask = xobj.SMask;
		if (mask) {
			var fgImg = doc.addImage(new Image(fgPix, doc.loadImage(mask)));
			page.Resources.XObject[name] = fgImg;
		} else {
			page.Resources.XObject[name] = bgImg;
		}
	});
}
doc.save(scriptArgs[1], "garbage=compact,compress");
Frenzie is offline   Reply With Quote
Old 11-21-2024, 12:55 AM   #14
DanCa
Member
DanCa began at the beginning.
 
Posts: 21
Karma: 10
Join Date: Sep 2013
Device: none
Quote:
Originally Posted by Frenzie View Post
Define nothing. It comes out black for me, which isn't nothing.
For me the converted file looks exactly the same and is 16 kB larger (out of 11MB).

Which version of mutools did you use?

I tried 1.23.10+ds1-1build3 on linux and 1.23.0 on windows. They create slightly different versions with no visible difference.

I had to replace DeviceGray with mupdf.ColorSpace.DeviceGray for both.

Maybe my pdf is different, I downloaded it a long time ago, and can't find the original anymore. I'll see if I can find a publicly available pdf for comparison purposes.

pdfimages shows the two images and the mask layer

1 0 image 1816 2925 rgb 3 8 jpx no 498 0 360 360 72.7K 0.5%
1 1 image 1816 2925 rgb 3 8 jpx no 500 0 360 360 8067B 0.1%
1 2 smask 1816 2925 gray 1 1 jbig2 no 500 0 360 360 47.3K 7.3%
DanCa is offline   Reply With Quote
Old 11-21-2024, 05:36 AM   #15
Frenzie
Wizard
Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.
 
Posts: 1,750
Karma: 730681
Join Date: Oct 2014
Location: Antwerp
Device: Kobo Aura H2O
The script I posted works in 1.21, the original version likely is intended for 1.19.
Frenzie is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Internet Archive tubemonkey Audiobook Discussions 0 08-30-2014 02:27 PM
Internet Archive preserves paper books wallcraft General Discussions 24 06-18-2011 02:17 PM
Shortcovers (Kobo?) adds 1.8 million scanned books from The Internet Archive anurag News 11 06-15-2011 06:15 AM
ARTICLE: Internet Archive BookServer ekaser News 3 10-20-2009 10:20 PM
Images from Google Books, Internet Archive, etc. vivaldirules Upload Help 18 09-17-2009 10:00 AM


All times are GMT -4. The time now is 11:43 PM.


MobileRead.com is a privately owned, operated and funded community.