Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Plugins

Notices

Reply
 
Thread Tools Search this Thread
Old 04-03-2014, 06:12 PM   #616
PeterT
Grand Sorcerer
PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.
 
PeterT's Avatar
 
Posts: 12,119
Karma: 73448614
Join Date: Nov 2007
Location: Toronto
Device: Nexus 7, Clara, Touch, Tolino EPOS
Just tried the plugin on a genuine kEpub and then loaded it into Edit and did a check which reported several occurences opening and ending tag mismatches.

If you'd like to PM me an eMail address I'll supply a copy of the book I am testing with.
PeterT is offline   Reply With Quote
Old 04-03-2014, 06:28 PM   #617
Rev. Bob
Wizard
Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.
 
Rev. Bob's Avatar
 
Posts: 1,760
Karma: 9918418
Join Date: Feb 2013
Location: Here on the perimeter, there are no stars
Device: Kobo H2O, iPad mini 3, Kindle Touch
Quote:
Originally Posted by PeterT View Post
Just tried the plugin on a genuine kEpub and then loaded it into Edit and did a check which reported several occurences opening and ending tag mismatches.

If you'd like to PM me an eMail address I'll supply a copy of the book I am testing with.
PM sent, and I've updated the plugin code in my original message.
Rev. Bob is offline   Reply With Quote
Advert
Old 04-05-2014, 06:24 PM   #618
Rev. Bob
Wizard
Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.
 
Rev. Bob's Avatar
 
Posts: 1,760
Karma: 9918418
Join Date: Feb 2013
Location: Here on the perimeter, there are no stars
Device: Kobo H2O, iPad mini 3, Kindle Touch
Quote:
Originally Posted by Perkin View Post
@Rev. Bob
This is a modified 'modify epub' plugin that contains the changes.
Please note that I haven't tested it as I haven't got a file that needs modifying.
Quote:
Originally Posted by PeterT View Post
Just tried the plugin on a genuine kEpub and then loaded it into Edit and did a check which reported several occurences opening and ending tag mismatches.

If you'd like to PM me an eMail address I'll supply a copy of the book I am testing with.
Having received the book in question and run some tests on it, the problem seems to stem from the entity-match routine getting thrown out of sync by a self-closed A tag:

Quote:
Originally Posted by Before
<p class="indent"><span id="kobo.8.9">Blue and white self-seeded <a id="page_208"/>campanulas totter.</span> Hollow-stemmed <em>Galega</em>, all knees and elbows, capsizes over the path. The white ox-eye daisies, for a few weeks a wonderful swaying sea of white, have collapsed into an untidy straggle.</p>
Quote:
Originally Posted by After
<p class="indent">Blue and white self-seeded <a id="page_208"/>campanulas totter.</span> Hollow-stemmed <em>Galega</em>, all knees and elbows, capsizes over the path. The white ox-eye daisies, for a few weeks a wonderful swaying sea of white, have collapsed into an untidy straggle.
The latest code (still with the de-Kobify and <span> stripping as one routine) is attached. Perkin, can you take a look?
Attached Files
File Type: zip Modify ePub - stripspans.zip (150.6 KB, 174 views)
Rev. Bob is offline   Reply With Quote
Old 04-06-2014, 05:22 AM   #619
Perkin
Guru
Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.
 
Perkin's Avatar
 
Posts: 655
Karma: 64171
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin)
The self closing tags should have been caught in the elif clauses

change the -1 to a -2 in the line
Code:
elif entity[-1:] == '/>':
Edit: line 590 in modify.py in your last post' zip

Last edited by Perkin; 04-06-2014 at 05:46 AM.
Perkin is offline   Reply With Quote
Old 04-06-2014, 06:07 AM   #620
Terisa de morgan
Grand Sorcerer
Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.
 
Terisa de morgan's Avatar
 
Posts: 6,211
Karma: 11766195
Join Date: Jun 2009
Location: Madrid, Spain
Device: Kobo Clara/Aura One/Forma,XiaoMI 5, iPad, Huawei MediaPad, YotaPhone 2
I'm testing this because the <span></span> are among the things I most dislike in a html file. Thank you very much.
Terisa de morgan is offline   Reply With Quote
Advert
Old 04-06-2014, 11:30 PM   #621
PeterT
Grand Sorcerer
PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.
 
PeterT's Avatar
 
Posts: 12,119
Karma: 73448614
Join Date: Nov 2007
Location: Toronto
Device: Nexus 7, Clara, Touch, Tolino EPOS
Quote:
Originally Posted by Perkin View Post
The self closing tags should have been caught in the elif clauses

change the -1 to a -2 in the line
Code:
elif entity[-1:] == '/>':
Edit: line 590 in modify.py in your last post' zip
I made that change in my installed script and saw no issues with my test case book; the one that had been triggering errors.

I can't help but wonder if you might also want to remove the
Code:
<a id="page_44"/>
tags?

Last edited by PeterT; 04-06-2014 at 11:33 PM.
PeterT is offline   Reply With Quote
Old 04-07-2014, 12:34 AM   #622
Rev. Bob
Wizard
Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.
 
Rev. Bob's Avatar
 
Posts: 1,760
Karma: 9918418
Join Date: Feb 2013
Location: Here on the perimeter, there are no stars
Device: Kobo H2O, iPad mini 3, Kindle Touch
Quote:
Originally Posted by PeterT View Post
I made that change in my installed script and saw no issues with my test case book; the one that had been triggering errors.

I can't help but wonder if you might also want to remove the
Code:
<a id="page_44"/>
tags?
Those actually have a function; they correspond to the print book's pages. They don't do any harm, might do some good (depending on the NCX structure), and can't easily be restored if deleted, so I'm inclined to leave 'em alone.
Rev. Bob is offline   Reply With Quote
Old 04-07-2014, 05:18 AM   #623
Perkin
Guru
Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.
 
Perkin's Avatar
 
Posts: 655
Karma: 64171
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin)
They could also be removed fairly simply with a s&r if they're not wanted, not as difficult as why these additions were being done.

Are they consistently declared across ebooks with their 'page_###'?

Anyone who is offended by them would be editing a book, so would be able to do the simple s&r: Regex search
Code:
<a id="page_\d+"/>
replace with nothing
Perkin is offline   Reply With Quote
Old 04-07-2014, 11:45 AM   #624
Rev. Bob
Wizard
Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.
 
Rev. Bob's Avatar
 
Posts: 1,760
Karma: 9918418
Join Date: Feb 2013
Location: Here on the perimeter, there are no stars
Device: Kobo H2O, iPad mini 3, Kindle Touch
Quote:
Originally Posted by Perkin View Post
The self closing tags should have been caught in the elif clauses

change the -1 to a -2 in the line
Code:
elif entity[-1:] == '/>':
Edit: line 590 in modify.py in your last post' zip
I've made that change, and I've found another bug: if HR, BR, or IMG are coded as no-content containers rather than self-closing elements (stupid, but legal), the closing tags are removed but the opening tag is not converted to self-closing.

In other words, <hr></hr> is truncated to a bad <hr> instead of converted to a correct <hr/>.

The culprit seems to be the logic in lines 590-591 of the attached version's modify.py, in which those elements are always assumed to be self-closing:

Code:
elif entity[:3] == '<hr' or entity[:3] == '<br' or entity[:4] == '<img':
    this_entity.e_type = 3
To dodge that bug, I've simply commented that test out for now. Thus, those elements are tested like every other element, and the bad-but-okay form is preserved - but it would be nice if <foo a="x" b="y"></foo> could be converted to <foo a="x" b="y"/> across the board. I'm just not sure how to modify your code to do so.
Attached Files
File Type: zip Modify ePub - stripspans.zip (150.6 KB, 184 views)

Last edited by Rev. Bob; 04-07-2014 at 11:47 AM.
Rev. Bob is offline   Reply With Quote
Old 04-07-2014, 12:27 PM   #625
Perkin
Guru
Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.
 
Perkin's Avatar
 
Posts: 655
Karma: 64171
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin)
Quote:
Originally Posted by Rev. Bob View Post
I've made that change, and I've found another bug: if HR, BR, or IMG are coded as no-content containers rather than self-closing elements (stupid, but legal), the closing tags are removed but the opening tag is not converted to self-closing.

In other words, <hr></hr> is truncated to a bad <hr> instead of converted to a correct <hr/>.

The culprit seems to be the logic in lines 590-591 of the attached version's modify.py, in which those elements are always assumed to be self-closing:

Code:
elif entity[:3] == '<hr' or entity[:3] == '<br' or entity[:4] == '<img':
    this_entity.e_type = 3
To dodge that bug, I've simply commented that test out for now. Thus, those elements are tested like every other element, and the bad-but-okay form is preserved - but it would be nice if <foo a="x" b="y"></foo> could be converted to <foo a="x" b="y"/> across the board. I'm just not sure how to modify your code to do so.
Doing a quick test (and having to research) the last truncation can be done quite simply...
Code:
#!/usr/bin/env python

import re

result = re.sub(r'(<(.*)[^>]+)></\2>', r'\1/>', '<foo a="x" b="y"></foo>')
print result
Perkin is offline   Reply With Quote
Old 04-07-2014, 12:38 PM   #626
Perkin
Guru
Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.
 
Perkin's Avatar
 
Posts: 655
Karma: 64171
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin)
I also meant to say regarding the non self-closing tags.
IIRC A lot of the elif's were to reduce length of processing time in pairing routine, be removing a lot of the elements needed to check, those (HR BR IMG) as you say should be self closing

The previous elif would be catching the non-self-closing tags but not the end tag, thus causing the mismatch (which you probably realise)

One way to change it to not catch them would be add a 1 and a space to each of the tests...
Code:
elif entity[:4] == '<hr ' or entity[:4] == '<br ' or entity[:5] == '<img ':
    this_entity.e_type = 3
Edit: But the elif with the '/>' would be catching them anyway.

Last edited by Perkin; 04-07-2014 at 12:42 PM.
Perkin is offline   Reply With Quote
Old 04-07-2014, 01:00 PM   #627
Rev. Bob
Wizard
Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.
 
Rev. Bob's Avatar
 
Posts: 1,760
Karma: 9918418
Join Date: Feb 2013
Location: Here on the perimeter, there are no stars
Device: Kobo H2O, iPad mini 3, Kindle Touch
Quote:
Originally Posted by Perkin View Post
I also meant to say regarding the non self-closing tags.
IIRC A lot of the elif's were to reduce length of processing time in pairing routine, be removing a lot of the elements needed to check, those (HR BR IMG) as you say should be self closing
Should be, yes, but XHTML does not force them to be; the <hr></hr> construction is exactly as valid as <hr/>, even though the latter is the preferred form.

Quote:
Originally Posted by Perkin
One way to change it to not catch them would be add a 1 and a space to each of the tests...
Not valid; all three of those elements can (and commonly do) have attributes, which would match the new test.

Quote:
Originally Posted by Perkin
Edit: But the elif with the '/>' would be catching them anyway.
That's why I just completely commented out the HR/BR/IMG test. It'd be nice to convert 'em to self-closing, but not touching them is preferable to breaking them.
Rev. Bob is offline   Reply With Quote
Old 04-07-2014, 01:04 PM   #628
Perkin
Guru
Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.
 
Perkin's Avatar
 
Posts: 655
Karma: 64171
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin)
Quote:
Originally Posted by Rev. Bob View Post
Not valid; all three of those elements can (and commonly do) have attributes, which would match the new test.
Yeah, I was thinking (again) of the self-closing tags and not the non-self-closing ones.
Perkin is offline   Reply With Quote
Old 04-07-2014, 01:08 PM   #629
Rev. Bob
Wizard
Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.Rev. Bob ought to be getting tired of karma fortunes by now.
 
Rev. Bob's Avatar
 
Posts: 1,760
Karma: 9918418
Join Date: Feb 2013
Location: Here on the perimeter, there are no stars
Device: Kobo H2O, iPad mini 3, Kindle Touch
Quote:
Originally Posted by Perkin View Post
Yeah, I was thinking (again) of the self-closing tags and not the non-self-closing ones.
Either form can have attributes; <br class="calibre1"/> is pretty common, and all IMG elements have to have SRC and ALT attributes.
Rev. Bob is offline   Reply With Quote
Old 04-07-2014, 02:27 PM   #630
Perkin
Guru
Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.
 
Perkin's Avatar
 
Posts: 655
Karma: 64171
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin)
Just been re-reading the last few pages of the thread.

This was from post #574 made on 30-March
Quote:
Originally Posted by Rev. Bob View Post
Similarly, an option to remove SPAN tags with no attributes, and to merge adjacent SPAN tags with the same attributes, would also be nifty. Phoenix Pick is horrible about that last one. Merging adjacent or empty bold/italic tags would also be useful; cases like </i><i>, <i/>, and <b></b> should always be removed.
I thought about this and it could cause problems unless the previous starting tag has no extra class/styles, which would/could only be caught with a full parser rather than simple s&r's

Say you had
Code:
<i class="something">Here's</i><i> some text</i>
and removed the </i><i> then the ' some text' would also get the something class styling as well, which wasn't what was wanted.

Also thought about removing the empty tags i.e. '<i/>' or even '<i></i>'

Could there be a reason not to remove them ~ maybe they could be changing the layout somehow, that if they're removed the layout may be different ~

Somewhere in my mind there's a niggle - something to do with seeing an empty tag that caused an alteration of linespacing or widths or ????.

Anyway maybe it's more of a 'Do them by hand' rather than automated.
Perkin is offline   Reply With Quote
Reply

Tags
modify epub

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
[GUI Plugin] Quality Check kiwidude Plugins 1171 03-23-2024 05:18 AM
[GUI Plugin] Open With kiwidude Plugins 402 03-16-2024 11:44 PM
[GUI Plugin] Manage Series kiwidude Plugins 166 02-13-2024 11:31 AM
Modify ePub plugin dev thread kiwidude Development 346 09-02-2013 05:14 PM
[GUI Plugin] Plugin Updater **Deprecated** kiwidude Plugins 159 06-19-2011 12:27 PM


All times are GMT -4. The time now is 08:35 PM.


MobileRead.com is a privately owned, operated and funded community.