MobileRead Forums

MobileRead Forums (https://www.mobileread.com/forums/index.php)
-   Plugins (https://www.mobileread.com/forums/forumdisplay.php?f=268)
-   -   Plugin for tidying ePub files (https://www.mobileread.com/forums/showthread.php?t=264378)

Doitsu 09-15-2015 06:19 AM

Quote:

Originally Posted by gipsy (Post 3170894)
Ok. That's weird...
I test the latest version of the plugin with a portable sigil and the hyphens fix works fine :chinscratch:
It doesn't work with the installed version of Sigil :blink:

The latest version (ePubTidyTool_v0.1_1_1_1_3.zip) works fine with double backslashes in the WordDictionary.txt file path on my Windows 10 machine.

Maybe the problem is caused by a leftover ePubTidyTool.json file. You may want to manually delete the following folders and reinstall the plugin.

Code:

%localappdata%\sigil-ebook\sigil\plugins_prefs\ePubTidyTool
%localappdata%\sigil-ebook\sigil\plugins\ePubTidyTool


gipsy 09-15-2015 06:42 AM

It work now. Thanks Doitsu

@CalibUser I add some more fixes :D

Code:

        #Fixes '…' when PDFd as ...
        CorrectText("Changed ... to …", r'\.\.\.', r'…')

        #Fixes 'η' when PDFd as ΐ]
        CorrectText("Changed ΐ] to η", r'ΐ]', r'η')
       
        #Fixes 'στη' when PDFd as σιη
        CorrectText("Changed σιη to στη", r'σιη', r'στη')

        #Fixes 'στ(η|ο|ον|α|ις|ην)' when PDFd as  '"οτ(η|ο|ον|α|ις|ην)'
        CorrectText("Changed οτ(η|ο|ον|α|ις|ην) to στ(η|ο|ον|α|ις|ην)", r' οτ(η|ο|ον|α|ις|ην) ', r' στ\1 ')

        #Fixes 'των' when PDFd as  'τ(οι|οι)ν'
        CorrectText("Changed τ(οι|ιο)ν to των", r' τ(οι|ιο)ν ', r' των ')

        #Fixes 'ού' when PDFd as  'οιί'
        CorrectText("Changed οιί to ού", r'οιί', r'ού')

        #Fixes 'στις' when PDFd as σιις
        CorrectText("Changed σιις to στις", r'σιις', r'στις')

        #Fixes 'στ(η|ο|ον|ην)' when PDFd as οτ(η|ο|ον|ην)
        CorrectText("Changed οτ(η|ο|ον|ην) to στ(η|ο|ον|ην)", r' οτ(η|ο|ον|ην) ', r'στ\1')

        #Fixes 'στ(ο|ου|α)' when PDFd as  σι(ο|ου|α)
        CorrectText("Changed σι(ο|ου|α) to στ(ο|ου|α)", r' σι(ο|ου|α)', r'στ\1')

        #Fixes 'ώ' when PDFd as ο'ι
        CorrectText("Changed ο'ι to ώ", r'(ο\'ι|\(ί\))', r'ώ')
       
        #Fixes 'Άκουσ' when PDFd as Ακόυσ
        CorrectText("Changed Ακόυσ to Άκουσ", r'Ακόυσ', r'Άκουσ')
       
        #Fixes 'γι’' when PDFd as γΓ,γΡ
        CorrectText("Changed γΓ γΡ to γι’", r'(γΓ|γΡ)', r'γι’')

        #Fixes 'ντι' when PDFd as νπ
        CorrectText("Changed νπ to ντι", r'νπ', r'ντι')
       
        #Fixes 'Γι’' when PDFd as ΓΓ
        CorrectText("Changed ΓΓ to Γι’", r'ΓΓ ', r'Γι’ ')

        #Fixes 'σχεδίαζ' when PDFd as σχέδιαζ
        CorrectText("Changed σχέδιαζ to σχεδίαζ", r'σχέδιαζ', r'σχεδίαζ')
       
        #Fixes '\u0388' when PDFd as 'E "E
        CorrectText("Changed 'E,\"E to \u0388", r'(\'|\")(\u0395)', r'Έ')

        #Fixes \u038E when PDFd as 'Y or "Y
        CorrectText("Changed 'Y,\"Y to \u038E", r'(\'|\")(\u03A5)', r'Ύ')

        #Fixes \u038A when PDFd as 'I or "I
        CorrectText("Changed 'I,\"I to \u038A", r'(\'|\")(\u0399)', r'Ί')

        #Fixes \u038C when PDFd as 'O or "O
        CorrectText("Changed 'O,\"O to \u038C", r'(\'|\")(\u039F)', r'Ό')

        #Fixes \u0386 when PDFd as 'A or "A
        CorrectText("Changed 'A,\"A to \u0386", r'(\'|\")(\u0391)', r'Ά')

        #Fixes \u0389 when PDFd as 'H or "H
        CorrectText("Changed 'H,\"H to \u0389", r'(\'|")(\u0397)', r'Ή')

        #Fixes \u038F when PDFd as '\u03C9 or "\u03C9
        CorrectText("Changed '\u03C9,\"\u03C9 to \u038F", r'(\'|\")(\u03C9)', r'Ώ')

        #Fixes \u03CD when PDFd as \u03B0
        CorrectText("Changed \u03CD to \u03B0", r'ΰ', r'ύ')

        #Fixes \u03CD when PDFd as \u03B0
        CorrectText("Changed ε' to έ", r'ε\'', r'έ')

        #Fixes ς Character when PDFd as ςCharacter
        CorrectText("Changed ςCharacter to ς Character", r'ς([\u0370-\u03CE])', r'ς \1')


CalibUser 09-16-2015 03:35 PM

I have updated the plugin in the first post to include the latest changes for Greek texts from gipsy.

Thanks for pointing out that it may be necessary to delete any leftover ePubTidyTool.json files, Doitsu. I will add that advice to the epub guide for this plugin when I next update it.

CalibUser 09-18-2015 11:05 AM

I have updated the plugin in the first post so that images in an ePub can be resized.

This will require the PILlow image library from https://pypi.python.org/pypi/Pillow/2.9.0

I decided to make the plugin 'beep' if an alphabetical character is inserted in an entry box for image size; as there is not a 'beep' facility within Python I had to produce a 'beep' that is system dependent. I have only been able to test this on Windows 7, hopefully this will work on Linux (I believe it will be necessary to install sox for the 'beep') and Mac systems too.

One problem I had with the code for resizing the image was that I could not seem to read the image file using bk.readfile() into a buffer that PILow could process [I experimented with frombytes(), frombuffer() and fromarray()] so I resorted to saving the original image to disc and opening it again under PILow - not an ideal process.

The ePub manual in the first post contains instructions on how to use the new feature.

KevinH 09-18-2015 11:12 AM

Quote:

Originally Posted by CalibUser (Post 3173121)
One problem I had with the code for resizing the image was that I could not seem to read the image file using bk.readfile() into a buffer that PILow could process [I experimented with frombytes(), frombuffer() and fromarray()] so I resorted to saving the original image to disc and opening it again under PILow - not an ideal process.

Any file that does not have an application/xhtml+xml media type is read as pure binary by the launcher and returned that way. It will be a string of bytes.

At least, that is what it should do.

I will look at the launcher code to make sure that is really what is happening, just in case. I have no idea how data is passed to Pillow but if it wants a file you should be able to use StringIO with with bk.readfile() returns from an image.

I will look into this in case a fix is needed.

Thanks,

KevinH

Doitsu 09-18-2015 12:51 PM

Quote:

Originally Posted by CalibUser (Post 3173121)
One problem I had with the code for resizing the image was that I could not seem to read the image file using bk.readfile() into a buffer that PILow could process [...]

AFAIK, you'll need to use BytesIO() to convert the image data returned by bk.readfile().

For example:

Code:

from PIL import Image
from io import BytesIO

def run(bk):
    data = bk.readfile('cover.jpg')
    img = Image.open(BytesIO(data))
    (width, height) = img.size
    print(width, height)


CalibUser 09-18-2015 03:32 PM

Thanks, Doitsu, this method allows me to read an image from an epub into img. After resizing the image, how can I write it back in a format that bk.writefile() can use? I've tried a few approaches including bkimage = BytesIO(self.img ) but without success.

Doitsu 09-18-2015 04:21 PM

Quote:

Originally Posted by CalibUser (Post 3173302)
Thanks, Doitsu, this method allows me to read an image from an epub into img. After resizing the image, how can I write it back in a format that bk.writefile() can use? I've tried a few approaches including bkimage = BytesIO(self.img ) but without success.

AFAIK, you'll have to use BytesIO.

Code:

from PIL import Image
from io import BytesIO

def run(bk):
    data = bk.readfile('cover.jpg')
    img = Image.open(BytesIO(data))
    img.thumbnail((330, 330), Image.ANTIALIAS)
    imagedata = BytesIO()
    img.save(imagedata, 'png')
    thumbnail = imagedata.getvalue()
    bk.addfile('thumbnail.png', 'thumbnail.png', thumbnail, 'image/png')

(For some odd reason img.save() doesn't appear to work with jpg as the file format.)

CalibUser 09-18-2015 06:08 PM

Thanks, Doitsu. I will try to get something to work for jpg files when I have more time.

KevinH 09-18-2015 10:33 PM

Hi,
The launcher code is correct. Doitsu is correct BytesIO is the way to go.

Don't forget BytesIO gives a filedescriptor like interface so after writing to it don't forget to do seek(0) with it before trying to read from it after writing. Also when passing in data using BytesIO don't forget to pass in image format type to Image since it can't see a file extension that would convey the image type info. See some examples in Pillow's docs or via a google search for more info. If you run into difficulty, post some code and I'd be happy to figure out what is going on.

KevinH

CalibUser 09-20-2015 03:03 PM

I have improved the code in the plugin for resizing images so that temporary files are not written to disc (thanks to Doitsu and KevinH) as in my previous version. The updated plugin is in the first post of this thread.

Technical note:
As Doitsu pointed out, "img.save() doesn't appear to work with jpg as the file format." According to information on the internet, the format of the jpg file is the same as for jpeg (early versions of Windows could not handle four character extensions as could other OS, so jpg was used as an extension for Windows. As modern Windows can handle four character extensions this is no longer a problem and it seems that Python uses jpeg only, so by treating jpg files as jpeg files, I have made the plugin work with jpg file extensions).

KevinH 09-20-2015 07:08 PM

Hi,
When using BytesIO() there is no filename or extension. Therefore you must tell the Image routine the type of image you are reading in or saving. These are not the same as file extensions. The format for the .jpg, .jpeg, etc file extension is provided by the value "JPEG". So to save a jpeg (.jpg) image to a BytesIO() object you should always use "JPEG" as the requested output format. If you work with actual files then the file extension is used in place of the format info.

Hope this explains things. If interested Pillow can be found on github and you can see the source.

KevinH

CalibUser 09-21-2015 03:11 PM

Thanks for the clarification, KevinH.

gipsy 09-21-2015 04:02 PM

Hi,
I tried to add another span replacement. If you have a <span style="font-variant:small-caps;"> to replace it with \U\1\E but it doesn't work :P

I add a
Code:

                                elif comboBox[i].get() == "Change to UPPER":
                                        tagtoprocess=spanTagList[i]+"(.*?)</span>"
                                        html=re.sub(tagtoprocess, r'\U\1\E', html)

and a "Change to UPPER" in cbChoices without any result of course.

I search and i found that if you want the
Code:

\Uabc\Edef
it must me writen as
Code:

re.search("abc".upper() + "def", var)
in python. But i can't get it to work with my zero coding abilities :)

Thanks!

gipsy 09-23-2015 04:57 AM

CalibUser is possible to use the code for the hypens fix with other character?
For example... Sometimes you get
Code:

"ύ" instead of "έ"
"ο" instead of "σ"

.

Thanks


All times are GMT -4. The time now is 08:29 PM.

Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.