MobileRead Forums - View Single Post

slimcat · Yesterday, 04:52 AM

Quote:

Originally Posted by thiago.eec

Just checked here: EPUBCheck treats invalid/DRMed files differently. If the books still has DRM, EPUBCheck can't analize the file, so it will return a FATAL error. If it has parsing errors (like a mismatching tag), it also abort the check and return a FATAL error. When the book is not malformed, then the other errors are marked as ERROR.

A DRMed book can't have its result marked as FALSE (red X), because it was not analyzed. On the other hand, the book with parsing errors restul should be marked as FALSE.

The problem is EPUBCheck treats them equally. If it can't finalize the check, the file is marked with a FATAL flag.

I guess the only pratical option would be to clear the status column, when FATAL errors occurs.

Thanks so much for the explanation. I completely missed the difference between the handling of fatal and non-fatal errors.

I did figure out a way to mark invalid epubs as FALSE using an Action Chains script that reads the JSON reports and sets results to No if there are fatal errors (it also clears/replaces previous Yes results). It doesn't touch the result column for anything else (e.g. files with no report or with a pre-existing pass or fail because of non-fatal errors). If a report is gone or can't be read, it just skips that epub.

You can run it on your entire library and it will flag all the FATAL aborts. It's been really fast for me. Almost instantaneous, even on large batches (although I know that will vary by system).

Script is below and Action Chain is attached, in case others find it useful.

Script:

Code:

import json
import re
import urllib.parse

def run(gui, settings, chain):
    db = gui.current_db
    newdb = db.new_api
    book_ids = list(gui.library_view.get_selected_ids())

    for book_id in book_ids:
        report_html = newdb.field_for('#epubcheck_report', book_id)

        if not report_html:
            continue

        match = re.search(r'href="file:///(.*?)"', report_html)
        if not match:
            continue

        path = '/' + urllib.parse.unquote(match.group(1))

        try:
            with open(path, 'r', encoding='utf-8') as f:
                data = json.load(f)
        except:
            continue

        nFatal = data.get('checker', {}).get('nFatal', 0)

        if nFatal > 0:
            newdb.set_field('#epubcheck_result', {book_id: False})

fwiw I initially tried setting up a chain that ran EPUBCheck and then the script, but it wasn't reliable. I couldn’t figure out how to make sure the reports were written before the script ran, even with small batches (like 5 books). I tried adding a delay, but it wasn't long enough and I didn't want to keep guessing & slowing things down further. The benefit didn't seem worth it when the script is easy to run separately and super fast on its own.