Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 01-15-2016, 07:15 PM   #1
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,623
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
Work on an unzipped EPUB xhtml file

Hi

I use Linux. If I unzip an EPUB, I can use a python script to work with a terminal on the .xhtml files and I can perform this way some tasks I am unable to do directly on the EPUB.

However, things do not appear to be as easy as that, specially for saving the output. Are there any recommendations to follow to modify safely these .xhtml files?

The goal is to modify one of these files and import it back in the EPUB. This is how the script is looking.

Spoiler:

#!/usr/bin/python3.5

import re, os, sys, glob

pref,suff='chapter','.xhtml'

# Recherche du fichier de numéro le plus élevé
fichiers=glob.glob('%s*[0-9]%s'%(pref,suff))
def num_fichier(fic):
k=re.search('%s(\d+)%s'%(pref,suff),fic)
if k: return int(k.group(1))
fichiers.sort(key=num_fichier)
der=num_fichier(fichiers[-1])
print("der=%d"%(der))

# On vérifie que le fichier de sortie n'existe pas déjà
out='%s%smodif%s'%(pref,der,suff)
if os.path.lexists(out):
sys.stderr.write("\nAttention : le fichier %s existe déjà\n\n"%out)
exit(1)

# Recherche de : href="fichier#ftnx" id="bodyftnx"
rec_lien=re.compile('(href="%s(?P<fil>\d+)%s#ftn(? P<id>\d+)"\s+id="bodyftn(?P=id)")'%(pref,suff))
# Recherche de : href="dernier_fichier#ftnx" id="bodyftnx"
rec_lien99=re.compile('(href="%s%s%s#ftn(?P<id>\d+ )"\s+id="bodyftn(?P=id)")'%(pref,der,suff))
# Recherche de : href="dernier_fichier#bodyftnx" id="ftnx"
lien99='(href="%s)%s(%s#bodyftn%%s"\s+id="ftn%%s)" '%(pref,der,suff)

# Liste des liens dans tous les fichiers sauf le dernier
liste_liens=[]
for num_fic in range(1,der):
try:
with open('%s%s%s'%(pref,num_fic,suff),'r') as fic: liens=rec_lien.findall(fic.read())
except FileNotFoundError: continue
for lien in liens:
num_fil=lien[1]
num_id=lien[2]
if num_fil=='%s'%der: liste_liens.append((num_fic,num_fil,num_id,lien[0]))

# fichier[id] : numéro du fichier qui contient le lien de numéro id
fichier={}
for num_fic,num_fil,num_id,lien in liste_liens:
# print("%s%-2s%s %2s %2s => %s"%(pref,num_fic,suff,num_fil,num_id,lien))
fichier[num_id]=str(num_fic)

# Modification des liens du dernier fichier
with open('%s%s%s'%(pref,der,suff),'r') as fic: f99=fic.read()
f99bis=f99
for id in fichier:
k=re.search(lien99%(id,id),f99bis)
if k:
f99bis=f99bis[:k.start(0)]+k.group(1)+fichier[id]+k.group(2)+f99bis[k.end(0):]

# Écriture du résultat
with open(out,'w') as fic: fic.write(f99bis)

Am I missing something obvious? Any practical recommendation appreciated..

Last edited by roger64; 01-15-2016 at 07:28 PM.
roger64 is offline   Reply With Quote
Old 01-16-2016, 02:30 AM   #2
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,680
Karma: 23983815
Join Date: Dec 2010
Device: Kindle PW2
AFAIK, a zipped epub archive needs be packed in certain sequence and the mimetype needs be added first and uncompressed.
The Sigil Plugin runner routines contain this Python 3 code that worked fine for me:

Code:
epub_mimetype = b'application/epub+zip'


def unzip_epub_to_dir(path_to_epub, destdir):
    f = open(pathof(path_to_epub), 'rb')
    sz = ZipFile(f)
    for name in sz.namelist():
        data = sz.read(name)
        name = name.replace("/", os.sep)
        filepath = os.path.join(destdir,name)
        basedir = os.path.dirname(filepath)
        if not os.path.isdir(basedir):
            os.makedirs(basedir)
        with open(filepath,'wb') as fp:
            fp.write(data)
    f.close()



def epub_zip_up_book_contents(ebook_path, epub_filepath):
    outzip = zipfile.ZipFile(pathof(epub_filepath), 'w')
    files = unipath.walk(ebook_path)
    if 'mimetype' in files:
        outzip.write(pathof(os.path.join(ebook_path, 'mimetype')), pathof('mimetype'), zipfile.ZIP_STORED)
    else:
        raise Exception('mimetype file is missing')
    files.remove('mimetype')
    for file in files:
        filepath = os.path.join(ebook_path, file)
        outzip.write(pathof(filepath),pathof(file),zipfile.ZIP_DEFLATED)
    outzip.close()
You can find the latest version (with all required imports, e.g. zipfile, os) on Github.

Since you're a Linux user, you could also use a shell script.

Alternatively, you could also run your Python code in Calibre Editor as a function or write a Sigil plugin.

This way all the packing and unpacking is handled by the hosting app.

Last edited by Doitsu; 01-16-2016 at 02:57 AM.
Doitsu is offline   Reply With Quote
Advert
Old 01-16-2016, 02:55 AM   #3
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,623
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
@Doitsu

Thanks for sharing this code.

I believed that I could import directly any .xhtml file from the Calibre editor...
roger64 is offline   Reply With Quote
Old 01-16-2016, 03:21 AM   #4
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,680
Karma: 23983815
Join Date: Dec 2010
Device: Kindle PW2
@roger64: Since your Python code seems to have something to do with footnotes, also check out my AddIDs plugin.
If you have the same number of footnote references and footnotes (and both are in the same order) you might be able to use it assign the proper ids to footnote references and footnotes. (You'd run it twice: once for the footnote references and once for the footnote definitions.)
Doitsu is offline   Reply With Quote
Old 01-16-2016, 07:50 AM   #5
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,623
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
Quote:
Originally Posted by Doitsu View Post
@roger64: Since your Python code seems to have something to do with footnotes, also check out my AddIDs plugin.
If you have the same number of footnote references and footnotes (and both are in the same order) you might be able to use it assign the proper ids to footnote references and footnotes. (You'd run it twice: once for the footnote references and once for the footnote definitions.)
This script goes just beyond this precise step. If the links are broken (if only the return links), I have used two regex like you to recreate the links (first the return ones with bad chapter numbers, then the body ones which point to the return file). As for the return links, the number of the body chapter is missing or wrong. This script finally retrieves the missing body chapter numbers and writes them on a new xhtml file.

If you are interested, I can PM you a test file using this script. It's quite efficient and quick but for this defect... It maybe could be integrated in your plugin.

I had first though I could have done a Calibre function out of this, but as no support seems to be available and I don't know how to proceed...
https://www.mobileread.com/forums/sho...41&postcount=1

Last edited by roger64; 01-16-2016 at 08:08 AM. Reason: function
roger64 is offline   Reply With Quote
Advert
Old 01-16-2016, 08:09 AM   #6
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,680
Karma: 23983815
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by roger64 View Post
This script goes just beyond this precise step. If the links are broken (if only the return links), I have used two regex to recreate the links (first the return ones with bad chapter numbers, then the body ones which point to the return file). As for the return links, the number of the body chapter is missing or wrong. This script finally retrieves the missing body chapter numbers and writes them on a new xhtml file.
If your book contains, for example, 10 footnote references and 10 footnote definitions in a separate file, and all of them are tagged with unique classes, you could simply first override all existing footnote reference backlink ids with id="fnbl1..10" and then all footnote definitions with id="fn1..10". You'd then only need one or two regex searches to add the required links/backlinks.

Quote:
Originally Posted by roger64 View Post
If you are interested, I can PM you a test file using this script. It's quite efficient and quick but for this defect... It maybe could be integrated in your plugin.
I'm mostly interested in developing plugins that can be repeatedly used; developing plugins just to fix a one-off special problem simply doesn't make sense.

BTW, also check out the Sigil footnote plugin.
Doitsu is offline   Reply With Quote
Old 01-16-2016, 09:16 AM   #7
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,623
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
I think we do not speak about the same thing. Your plugin checks the ids on both sides. This script checks the chapter numbers on the return side (on the body side they usually all point to the same chapter containing the notes so it's easy to check).

Even if the ids are correct, a wrong chapter number is enough to break the return link. So a safety check of the chapter numbers can confirm you than the links are working both sides. Ids and chapter numbers are the two variable elements of any link.

I asked a friend to write this script because I had to deal with some books with broken links. To put back the missing (or wrong) chapter numbers, I had to do it manually, jumping from one to another or...

I'll show you.

Last edited by roger64; 01-16-2016 at 09:19 AM.
roger64 is offline   Reply With Quote
Old 01-17-2016, 12:06 AM   #8
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,623
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
Interested people can now follow this thread here:

https://www.mobileread.com/forums/sho...66&postcount=1

@Doitsu
Thanks for your expert help for debugging the script.

Last edited by roger64; 01-17-2016 at 03:33 AM.
roger64 is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Working with unzipped EPUB folders schrijver Sigil 5 10-14-2015 02:02 PM
XHTML file limit? BobK99 Sigil 4 03-08-2013 05:38 AM
ncx file to html/xhtml file javochase Conversion 1 06-23-2011 06:57 PM
xhtml file name change bobcdy Sigil 11 10-23-2010 12:05 AM
Several xhtml/html to a single epub file help. clowe1028 ePub 3 03-21-2010 03:47 AM


All times are GMT -4. The time now is 12:56 PM.


MobileRead.com is a privately owned, operated and funded community.