Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > KOReader

Notices

Reply
 
Thread Tools Search this Thread
Old 11-21-2024, 01:17 PM   #16
Frenzie
Wizard
Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.
 
Posts: 1,758
Karma: 731681
Join Date: Oct 2014
Location: Antwerp
Device: Kobo Aura H2O
Some quick experimentation shows that key and value seem to be reversed compared to the older version.
Code:
// tested in MuPDF 1.24.3
if (scriptArgs.length < 2) {
	print("usage: mutool run dejavu.js input.pdf output.pdf");
	quit();
}

var bgPix = new Pixmap(ColorSpace.DeviceGray, [0,0,1,1], false);
var fgPix = new Pixmap(ColorSpace.DeviceGray, [0,0,1,1], false);
bgPix.clear(255);
fgPix.clear(0);

var doc = new PDFDocument(scriptArgs[0]);
var bgImg = doc.addImage(new Image(bgPix));
for (var i = 0; i < doc.countPages(); ++i) {
	var page = doc.findPage(i);
	page.Resources.XObject.forEach(function (xobj, name) {
		var mask = xobj.SMask;
		if (mask) {
			// In newer versions this seems to say "Error: truncated jbig2 segment header"
			// var fgImg = doc.addImage(new Image(fgPix, doc.loadImage(mask)));

			// Quick workaround for the above problem
			var fgImg = doc.graftObject(mask);

			page.Resources.XObject[name] = fgImg;
		} else {
			page.Resources.XObject[name] = bgImg;
		}
	});
}
doc.save(scriptArgs[1], "garbage=compact,compress");
Frenzie is online now   Reply With Quote
Old 11-22-2024, 08:36 PM   #17
DanCa
Member
DanCa began at the beginning.
 
Posts: 21
Karma: 10
Join Date: Sep 2013
Device: none
The mutool version was the issue. 1.23 does not work, but 1.21 did. Thank you very much!


I have added a script that converts all files in a given location. Warning: I made this to be run on my e-reader, so it doesn't keep the originals.

Spoiler:

Code:
import os
import shutil
import subprocess
# Delete scanned JPX (JPEG 2000) image layers in archive.org pdfs
# Warning: This script does not keep the original files


# Tested with mupdf 1.21. Does not work with 1.23. 

# path to mutool.exe
mutool = r"C:\mupdf-1.21.0-windows\mutool.exe"

def checkForJPX(filename):
    '''Check if the first page of the file contains a JPX layer.'''
    info_output = subprocess.run([mutool, 'info', filename, '1'], capture_output=True)
    if '[ JPX ]' in str(info_output.stdout):
        return True
    else:
        return False


def convertFile(filename):
    '''Remove scanned image layers from archive.org pdfs'''
    print('Working on: ', filename)
    if checkForJPX(filename):
        print('Trying to convert')
        tmpfile = filename + '_tmp.pdf'
        info_output = subprocess.run([mutool, 'run', 'dejazap.js', filename, tmpfile], capture_output=True)
        if checkForJPX(tmpfile):
            print('ERROR, file still has JPX layer, keeping temp file')
        else:
            shutil.move(tmpfile, filename)
            print('                  ... file converted')
    else:
        print('Does not contain JPX')




# from https://gist.github.com/TheMatt2/faf5ca760c61a267412c46bb977718fa
def walklevel(path, depth = 1):
    """It works just like os.walk, but you can pass it a level parameter
       that indicates how deep the recursion will go.
       If depth is 1, the current directory is listed.
       If depth is 0, nothing is returned.
       If depth is -1 (or less than 0), the full depth is walked.
    """
    # If depth is negative, just walk
    # Not using yield from for python2 compat
    # and copy dirs to keep consistant behavior for depth = -1 and depth = inf
    if depth < 0:
        for root, dirs, files in os.walk(path):
            yield root, dirs[:], files
        return
    elif depth == 0:
        return

    # path.count(os.path.sep) is safe because
    # - On Windows "\\" is never allowed in the name of a file or directory
    # - On UNIX "/" is never allowed in the name of a file or directory
    # - On MacOS a literal "/" is quitely translated to a ":" so it is still
    #   safe to count "/".
    base_depth = path.rstrip(os.path.sep).count(os.path.sep)
    for root, dirs, files in os.walk(path):
        yield root, dirs[:], files
        cur_depth = root.count(os.path.sep)
        if base_depth + depth <= cur_depth:
            del dirs[:]

if __name__=='__main__':
# set inputFolder to your e-reader's location
    inputFolder = r'F:\\' # Convert all pdf documents in this folder unless they don't contain JPX layers. Do not keep originals
    max_depth = 3
    print('Converting files in folder:', inputFolder)
    for root, _, files in walklevel(inputFolder, max_depth):
        for f in files:
            if not f.endswith('.pdf'):
                continue
            convertFile(os.path.join(root, f))
DanCa is offline   Reply With Quote
Old 11-23-2024, 11:57 AM   #18
jonnyl
Zealot
jonnyl is generous with chocolatejonnyl is generous with chocolatejonnyl is generous with chocolatejonnyl is generous with chocolatejonnyl is generous with chocolatejonnyl is generous with chocolatejonnyl is generous with chocolatejonnyl is generous with chocolatejonnyl is generous with chocolatejonnyl is generous with chocolatejonnyl is generous with chocolate
 
Posts: 136
Karma: 33084
Join Date: Jan 2021
Device: Likebook Mars
@Frenzie: Using your script above I get white text on black background, but besides that, it works great. Is that how it's supposed to come out? Is there any way to invert it? My mutool version is 1.23.10.

Swapping the .clear() numbers didn't change anything.

Last edited by jonnyl; 11-23-2024 at 12:01 PM.
jonnyl is offline   Reply With Quote
Old 11-24-2024, 02:38 PM   #19
Frenzie
Wizard
Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.
 
Posts: 1,758
Karma: 731681
Join Date: Oct 2014
Location: Antwerp
Device: Kobo Aura H2O
Quote:
Is there any way to invert it?
Only on 1.21 using the previous script I posted, or if you can figure out a way around the issue I noted in the comments:
Code:
			// In newer versions this seems to say "Error: truncated jbig2 segment header"
			// var fgImg = doc.addImage(new Image(fgPix, doc.loadImage(mask)));

			// Quick workaround for the above problem
			var fgImg = doc.graftObject(mask);
I suspect that won't be possible without diving into the MuPDF source though.
Frenzie is online now   Reply With Quote
Old 11-25-2024, 08:43 AM   #20
jonnyl
Zealot
jonnyl is generous with chocolatejonnyl is generous with chocolatejonnyl is generous with chocolatejonnyl is generous with chocolatejonnyl is generous with chocolatejonnyl is generous with chocolatejonnyl is generous with chocolatejonnyl is generous with chocolatejonnyl is generous with chocolatejonnyl is generous with chocolatejonnyl is generous with chocolate
 
Posts: 136
Karma: 33084
Join Date: Jan 2021
Device: Likebook Mars
I got v1.21 for Windows, and it's working beautifully now. Thanks!
jonnyl is offline   Reply With Quote
Old 01-17-2025, 01:03 PM   #21
jonnyl
Zealot
jonnyl is generous with chocolatejonnyl is generous with chocolatejonnyl is generous with chocolatejonnyl is generous with chocolatejonnyl is generous with chocolatejonnyl is generous with chocolatejonnyl is generous with chocolatejonnyl is generous with chocolatejonnyl is generous with chocolatejonnyl is generous with chocolatejonnyl is generous with chocolate
 
Posts: 136
Karma: 33084
Join Date: Jan 2021
Device: Likebook Mars
Unfortunately, the script only works on some PDFs. On others, I only get a whole book's worth of empty white pages. I ran `mutool extract` without the script on both kinds of books but I can't tell a difference. All output 2 blurry pages and 1 monochrome inverted legible image (the only one needed, but inverted back to black text-on-white if possible) per page, and in the same order.

Is there anything I could do to make the script work on the other kind of books as well?
jonnyl is offline   Reply With Quote
Old 01-17-2025, 02:34 PM   #22
Frenzie
Wizard
Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.
 
Posts: 1,758
Karma: 731681
Join Date: Oct 2014
Location: Antwerp
Device: Kobo Aura H2O
You'd have to link some of these other books first. Though my guess would be they don't separate the "text" and the "page background" into two layers but simply have it all as one image.
Frenzie is online now   Reply With Quote
Old 01-17-2025, 04:42 PM   #23
jonnyl
Zealot
jonnyl is generous with chocolatejonnyl is generous with chocolatejonnyl is generous with chocolatejonnyl is generous with chocolatejonnyl is generous with chocolatejonnyl is generous with chocolatejonnyl is generous with chocolatejonnyl is generous with chocolatejonnyl is generous with chocolatejonnyl is generous with chocolatejonnyl is generous with chocolate
 
Posts: 136
Karma: 33084
Join Date: Jan 2021
Device: Likebook Mars
I found the problem. The one not working had the monochrome mask images typed as 'mask', and the working one as 'smask' (according to `pdfimages -list`). I just edited "var mask = xobj.SMask;" to "var mask = xobj.Mask;" and to my astonishment that worked
jonnyl is offline   Reply With Quote
Old 01-17-2025, 06:25 PM   #24
Frenzie
Wizard
Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.
 
Posts: 1,758
Karma: 731681
Join Date: Oct 2014
Location: Antwerp
Device: Kobo Aura H2O
In that case you can probably just write it like:
Code:
var mask = xobj.SMask || xobj.Mask;
(or the other way around)
Frenzie is online now   Reply With Quote
Old 01-18-2025, 04:30 AM   #25
jonnyl
Zealot
jonnyl is generous with chocolatejonnyl is generous with chocolatejonnyl is generous with chocolatejonnyl is generous with chocolatejonnyl is generous with chocolatejonnyl is generous with chocolatejonnyl is generous with chocolatejonnyl is generous with chocolatejonnyl is generous with chocolatejonnyl is generous with chocolatejonnyl is generous with chocolate
 
Posts: 136
Karma: 33084
Join Date: Jan 2021
Device: Likebook Mars
Confirmed this works. Thanks!
jonnyl is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Internet Archive tubemonkey Audiobook Discussions 0 08-30-2014 02:27 PM
Internet Archive preserves paper books wallcraft General Discussions 24 06-18-2011 02:17 PM
Shortcovers (Kobo?) adds 1.8 million scanned books from The Internet Archive anurag News 11 06-15-2011 06:15 AM
ARTICLE: Internet Archive BookServer ekaser News 3 10-20-2009 10:20 PM
Images from Google Books, Internet Archive, etc. vivaldirules Upload Help 18 09-17-2009 10:00 AM


All times are GMT -4. The time now is 06:48 AM.


MobileRead.com is a privately owned, operated and funded community.