|  10-19-2022, 11:04 AM | #1 | 
| Connoisseur            Posts: 71 Karma: 18500 Join Date: Apr 2013 Device: Kindle Touch, Paperwhite | 
				
				Koreader is poor in handling Internet Archive books
			 
			
			And it seems to be non device speciffic. Doesn't matter whether it is a flagship phone, Kobo Aura, Kindle pw 4 Hangs up or crashes pretty much everytime I open one of these books. Can anything be done about it?
		 Last edited by MaxStirner; 10-19-2022 at 11:58 AM. | 
|   |   | 
|  10-20-2022, 07:05 AM | #2 | |
| cosiñeiro            Posts: 1,406 Karma: 2451781 Join Date: Apr 2014 Device: BQ Cervantes 4 | Quote: 
 In that case please put a link here pointing to one of the files that make the app hang/crash. Since you're talking about IA I'm assuming you're downloading books in the public domain. Most probably are broken documents but it is always interesting to learn from somebody else's errors   | |
|   |   | 
| Advert | |
|  | 
|  10-20-2022, 04:19 PM | #3 | 
| Connoisseur            Posts: 71 Karma: 18500 Join Date: Apr 2013 Device: Kindle Touch, Paperwhite | 
			
			No I am thinking about pdfs And even if Koreader does not ultimately crash or hang up, it takes ages tobrender a page
		 | 
|   |   | 
|  10-20-2022, 04:20 PM | #4 | 
| Resident Curmudgeon            Posts: 80,740 Karma: 150249619 Join Date: Nov 2006 Location: Roslindale, Massachusetts Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3 | |
|   |   | 
|  10-20-2022, 04:32 PM | #5 | 
| Connoisseur            Posts: 71 Karma: 18500 Join Date: Apr 2013 Device: Kindle Touch, Paperwhite | 
			
			Ok, but then  my reasoning is this - even if eink devices do not have the resources  to manage such files (not enough ram, too weak processor etc), could the process be made somehow faster on other decices like tablets or phones? They have tons of memory and should be able to deal with that..
		 | 
|   |   | 
| Advert | |
|  | 
|  10-20-2022, 05:10 PM | #6 | 
| cosiñeiro            Posts: 1,406 Karma: 2451781 Join Date: Apr 2014 Device: BQ Cervantes 4 | 
			
			Ok, so broken documents Use any pdf reader based on Pdfium (like anything based on Chrome or mostly anything based on android). They can help as they spawn multiple threads to render a single document and can handle multiple documents spawing multiple processes (each one with multiple threads). That doesn't fix the nature of the documents. They will be still broken, will be still slow to navigate them or to jump pages. If you want to read them using KOReader your best bet is to convert them to djvu. Or just reprint them with ghostscript tweaking some parameters. Or maybe there's a tool that's able to fix utterly big images on them automagically or, at least, fix/convert the color space. | 
|   |   | 
|  10-21-2022, 01:52 PM | #7 | 
| Connoisseur            Posts: 71 Karma: 18500 Join Date: Apr 2013 Device: Kindle Touch, Paperwhite | 
			
			Yes, i did a quick search on Koreader issues, turns out someone has already noticed the problem, and has a possibble soulution, happy to see that I am not alone. To bad that it looks like the issue is frozen https://github.com/koreader/koreader/issues/7992 Last edited by MaxStirner; 10-21-2022 at 01:56 PM. | 
|   |   | 
|  11-17-2024, 10:03 PM | #8 | 
| Member  Posts: 21 Karma: 10 Join Date: Sep 2013 Device: none | 
			
			Has anyone found a solution to view pdf documents from archive.org?  I have no problems with other scanned pdfs that have 10x the size, but even 10MB archive pdfs are impossibly slow to render (> 30s/page). I tried the dejazap.js tool from in the github issue mentioned by @MaxStirner, but it does nothing and doesn't even change the document size. I tried both Mask and SMask. The original link is dead, but the file is probably this one: https://ghostscript.com/~tor/stuff/. I had to replace DeviceGray by mupdf.ColorSpace.DeviceGray to make it run. Script is attached below: Spoiler: 
 | 
|   |   | 
|  11-18-2024, 07:39 AM | #9 | 
| Zealot            Posts: 141 Karma: 33086 Join Date: Jan 2021 Device: Likebook Mars | 
			
			I read lots of PDFs from archive.org. I always pre-process them using K2pdfopt (https://www.willus.com/k2pdfopt/). File size will usually be larger, but they will load quickly and in a format more friendly for ereader screens. There's a bit of a learning curve though to get optimal parameters (depends on source document, target device screen size, and your preferences).
		 | 
|   |   | 
|  11-18-2024, 09:22 AM | #10 | 
| Enthusiast            Posts: 44 Karma: 14828 Join Date: Feb 2023 Device: Boox Page, Kobo Aura SE | 
			
			I think the problem is related to image encoding of Archive.org PDF files. KOReader, or MuPdf, chokes on JPEG2000 encoded PDF files. Similar report from Sumatra: https://github.com/sumatrapdfreader/...df/issues/1922 One can export every page to PNG and then re-combine all the files to a PDF or use Finereader to accomplish this multi-step task. | 
|   |   | 
|  11-18-2024, 01:25 PM | #11 | ||
| Wizard            Posts: 1,784 Karma: 731691 Join Date: Oct 2014 Location: Antwerp Device: Kobo Aura H2O | Quote: 
 Quote: 
 | ||
|   |   | 
|  11-18-2024, 08:38 PM | #12 | 
| Member  Posts: 21 Karma: 10 Join Date: Sep 2013 Device: none | 
			
			So here are my results so far: - I have not managed to get anything useful out of the mutools script. This seems to be the way to go, just replace the layer. pdfimages -list clearly shows the 3 layers, 2 rgb and one gray, but the script in the two versions does absolutely nothing. - Acrobat Pro does not detect any background images or layers These are the file sizes, render times for the different approaches: * Original file, 11MB, 1min/page or crash * Original file, printed through ClawPDF driver, 172 MB, 1s/page * Original file, printed through Windows save as pdf, 100MB, 1s/page * Original file ran through k2pdfopt, default options or with - colorbg ffffff (which doesn't do anything, 72 MB, 1s/page + nasty dot pattern * djvu pdf2djvu --monochrome, 52 MB, 1s/page, nasty dithering artifacts * djvu pdf2djvu, 15 MB, 3s/page * djvu pdf2djvu.com, 11 MB, 40s/page * mutools version, 11MB, 1min/page or crash * no djvu available on archive.org It's scary that a 172 MB pdf from ClawPDF renders much faster than the 11MB archive.org original. Obviously none of the methods above removed the background from the scan. I've found some brute force methods using ImageMagick [1], but that seems like the wrong approach. The pdf is already split into layers with OCR'ed text, it doesn't make sense to me to flatten the pdf, save each page as an individual image, use a tool to try to separate the background from the text, do OCR, and then put everything back together. All of this to deal with lousy archive.org pdfs. [1] https://old.reddit.com/r/kindlescrib...slg/?context=3 | 
|   |   | 
|  11-20-2024, 06:56 AM | #13 | |
| Wizard            Posts: 1,784 Karma: 731691 Join Date: Oct 2014 Location: Antwerp Device: Kobo Aura H2O | Quote: 
 Swapping the colors in the .clear() commands and changing the mask to SMask takes care of that, and it is indeed a lot faster. Getting rid of the paper also improves render quality on eink. Code: if (scriptArgs.length < 2) {
	print("usage: mutool run dejavu.js input.pdf output.pdf");
	quit();
}
var bgPix = new Pixmap(DeviceGray, [0,0,1,1], false);
var fgPix = new Pixmap(DeviceGray, [0,0,1,1], false);
bgPix.clear(255);
fgPix.clear(0);
var doc = new PDFDocument(scriptArgs[0]);
var bgImg = doc.addImage(new Image(bgPix));
for (var i = 0; i < doc.countPages(); ++i) {
	var page = doc.findPage(i);
	page.Resources.XObject.forEach(function (name, xobj) {
		var mask = xobj.SMask;
		if (mask) {
			var fgImg = doc.addImage(new Image(fgPix, doc.loadImage(mask)));
			page.Resources.XObject[name] = fgImg;
		} else {
			page.Resources.XObject[name] = bgImg;
		}
	});
}
doc.save(scriptArgs[1], "garbage=compact,compress"); | |
|   |   | 
|  11-21-2024, 12:55 AM | #14 | 
| Member  Posts: 21 Karma: 10 Join Date: Sep 2013 Device: none | 
			
			For me the converted file looks exactly the same and is 16 kB larger (out of 11MB). Which version of mutools did you use? I tried 1.23.10+ds1-1build3 on linux and 1.23.0 on windows. They create slightly different versions with no visible difference. I had to replace DeviceGray with mupdf.ColorSpace.DeviceGray for both. Maybe my pdf is different, I downloaded it a long time ago, and can't find the original anymore. I'll see if I can find a publicly available pdf for comparison purposes. pdfimages shows the two images and the mask layer 1 0 image 1816 2925 rgb 3 8 jpx no 498 0 360 360 72.7K 0.5% 1 1 image 1816 2925 rgb 3 8 jpx no 500 0 360 360 8067B 0.1% 1 2 smask 1816 2925 gray 1 1 jbig2 no 500 0 360 360 47.3K 7.3% | 
|   |   | 
|  11-21-2024, 05:36 AM | #15 | 
| Wizard            Posts: 1,784 Karma: 731691 Join Date: Oct 2014 Location: Antwerp Device: Kobo Aura H2O | 
			
			The script I posted works in 1.21, the original version likely is intended for 1.19.
		 | 
|   |   | 
|  | 
| Thread Tools | Search this Thread | 
| 
 | 
|  Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| Internet Archive | tubemonkey | Audiobook Discussions | 0 | 08-30-2014 02:27 PM | 
| Internet Archive preserves paper books | wallcraft | General Discussions | 24 | 06-18-2011 02:17 PM | 
| Shortcovers (Kobo?) adds 1.8 million scanned books from The Internet Archive | anurag | News | 11 | 06-15-2011 06:15 AM | 
| ARTICLE: Internet Archive BookServer | ekaser | News | 3 | 10-20-2009 10:20 PM | 
| Images from Google Books, Internet Archive, etc. | vivaldirules | Upload Help | 18 | 09-17-2009 10:00 AM |